WorldWideScience

Sample records for libraries expressed sequences

  1. Analyses of an expressed sequence tag library from Taenia solium, Cysticerca.

    Directory of Open Access Journals (Sweden)

    Jonas Lundström

    Full Text Available BACKGROUND: Neurocysticercosis is a disease caused by the oral ingestion of eggs from the human parasitic worm Taenia solium. Although drugs are available they are controversial because of the side effects and poor efficiency. An expressed sequence tag (EST library is a method used to describe the gene expression profile and sequence of mRNA from a specific organism and stage. Such information can be used in order to find new targets for the development of drugs and to get a better understanding of the parasite biology. METHODS AND FINDINGS: Here an EST library consisting of 5760 sequences from the pig cysticerca stage has been constructed. In the library 1650 unique sequences were found and of these, 845 sequences (52% were novel to T. solium and not identified within other EST libraries. Furthermore, 918 sequences (55% were of unknown function. Amongst the 25 most frequently expressed sequences 6 had no relevant similarity to other sequences found in the Genbank NR DNA database. A prediction of putative signal peptides was also performed and 4 among the 25 were found to be predicted with a signal peptide. Proposed vaccine and diagnostic targets T24, Tsol18/HP6 and Tso31d could also be identified among the 25 most frequently expressed. CONCLUSIONS: An EST library has been produced from pig cysticerca and analyzed. More than half of the different ESTs sequenced contained a sequence with no suggested function and 845 novel EST sequences have been identified. The library increases the knowledge about what genes are expressed and to what level. It can also be used to study different areas of research such as drug and diagnostic development together with parasite fitness via e.g. immune modulation.

  2. Expression sequence tag library derived from peripheral blood mononuclear cells of the chlorocebus sabaeus

    Directory of Open Access Journals (Sweden)

    Tchitchek Nicolas

    2012-06-01

    Full Text Available Abstract Background African Green Monkeys (AGM are amongst the most frequently used nonhuman primate models in clinical and biomedical research, nevertheless only few genomic resources exist for this species. Such information would be essential for the development of dedicated new generation technologies in fundamental and pre-clinical research using this model, and would deliver new insights into primate evolution. Results We have exhaustively sequenced an Expression Sequence Tag (EST library made from a pool of Peripheral Blood Mononuclear Cells from sixteen Chlorocebus sabaeus monkeys. Twelve of them were infected with the Simian Immunodeficiency Virus. The mononuclear cells were or not stimulated in vitro with Concanavalin A, with lipopolysacharrides, or through mixed lymphocyte reaction in order to generate a representative and broad library of expressed sequences in immune cells. We report here 37,787 sequences, which were assembled into 14,410 contigs representing an estimated 12% of the C. sabaeus transcriptome. Using data from primate genome databases, 9,029 assembled sequences from C. sabaeus could be annotated. Sequences have been systematically aligned with ten cDNA references of primate species including Homo sapiens, Pan troglodytes, and Macaca mulatta to identify ortholog transcripts. For 506 transcripts, sequences were quasi-complete. In addition, 6,576 transcript fragments are potentially specific to the C. sabaeus or corresponding to not yet described primate genes. Conclusions The EST library we provide here will prove useful in gene annotation efforts for future sequencing of the African Green Monkey genomes. Furthermore, this library, which particularly well represents immunological and hematological gene expression, will be an important resource for the comparative analysis of gene expression in clinically relevant nonhuman primate and human research.

  3. Parallel Sequencing of Expressed Sequence Tags from Two Complementary DNA Libraries for High and Low Phosphorus Adaptation in Common Beans

    Directory of Open Access Journals (Sweden)

    Matthew W. Blair

    2011-11-01

    Full Text Available Expressed sequence tags (ESTs have proven useful for gene discovery in many crops. In this work, our objective was to construct complementary DNA (cDNA libraries from root tissues of common beans ( L. grown under low and high P hydroponic conditions and to conduct EST sequencing and comparative analyses of the libraries. Expressed sequence tag analysis of 3648 clones identified 2372 unigenes, of which 1591 were annotated as known genes while a total of 465 unigenes were not associated with any known gene. Unigenes with hits were categorized according to biological processes, molecular function, and cellular compartmentalization. Given the young tissue used to make the root libraries, genes for catalytic activity and binding were highly expressed. Comparisons with previous root EST sequencing and between the two libraries made here resulted in a set of genes to study further for differential gene expression and adaptation to low P, such as a 14 kDa praline-rich protein, a metallopeptidase, tonoplast intrinsic protein, adenosine triphosphate (ATP citrate synthase, and cell proliferation genes expressed in the low P treated plants. Given that common beans are often grown on acid soils of the tropics and subtropics that are usually low in P these genes and the two parallel libraries will be useful for selection for better uptake of this essential macronutrient. The importance of EST generation for common bean root tissues under low P and other abiotic soil stresses is also discussed.

  4. Cell-free translational screening of an expression sequence tag library of Clonorchis sinensis for novel antigen discovery.

    Science.gov (United States)

    Kasi, Devi; Catherine, Christy; Lee, Seung-Won; Lee, Kyung-Ho; Kim, Yu Jung; Ro Lee, Myeong; Ju, Jung Won; Kim, Dong-Myung

    2017-05-01

    The rapidly evolving cloning and sequencing technologies have enabled understanding of genomic structure of parasite genomes, opening up new ways of combatting parasite-related diseases. To make the most of the exponentially accumulating genomic data, however, it is crucial to analyze the proteins encoded by these genomic sequences. In this study, we adopted an engineered cell-free protein synthesis system for large-scale expression screening of an expression sequence tag (EST) library of Clonorchis sinensis to identify potential antigens that can be used for diagnosis and treatment of clonorchiasis. To allow high-throughput expression and identification of individual genes comprising the library, a cell-free synthesis reaction was designed such that both the template DNA and the expressed proteins were co-immobilized on the same microbeads, leading to microbead-based linkage of the genotype and phenotype. This reaction configuration allowed streamlined expression, recovery, and analysis of proteins. This approach enabled us to identify 21 antigenic proteins. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:832-837, 2017. © 2017 American Institute of Chemical Engineers.

  5. Mining of expressed sequence tag libraries of cacao

    Indian Academy of Sciences (India)

    Expressed sequence tags (ESTs) provide researchers with a quick and inexpensive route for discovering new genes, data on gene expression and regulation, and also provide genic markers that help in constructing genome maps. Cacao is an important perennial crop of humid tropics. Cacao EST sequences, as available ...

  6. Preparing and Analyzing Expressed Sequence Tags (ESTs Library for the Mammary Tissue of Local Turkish Kivircik Sheep

    Directory of Open Access Journals (Sweden)

    Nehir Ozdemir Ozgenturk

    2017-01-01

    Full Text Available Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260. EST data in this study have provided a new source of information to functional genome studies of sheep.

  7. Preparation of highly multiplexed small RNA sequencing libraries.

    Science.gov (United States)

    Persson, Helena; Søkilde, Rolf; Pirona, Anna Chiara; Rovira, Carlos

    2017-08-01

    MicroRNAs (miRNAs) are ~22-nucleotide-long small non-coding RNAs that regulate the expression of protein-coding genes by base pairing to partially complementary target sites, preferentially located in the 3´ untranslated region (UTR) of target mRNAs. The expression and function of miRNAs have been extensively studied in human disease, as well as the possibility of using these molecules as biomarkers for prognostication and treatment guidance. To identify and validate miRNAs as biomarkers, their expression must be screened in large collections of patient samples. Here, we develop a scalable protocol for the rapid and economical preparation of a large number of small RNA sequencing libraries using dual indexing for multiplexing. Combined with the use of off-the-shelf reagents, more samples can be sequenced simultaneously on large-scale sequencing platforms at a considerably lower cost per sample. Sample preparation is simplified by pooling libraries prior to gel purification, which allows for the selection of a narrow size range while minimizing sample variation. A comparison with publicly available data from benchmarking of miRNA analysis platforms showed that this method captures absolute and differential expression as effectively as commercially available alternatives.

  8. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.

    Science.gov (United States)

    Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L

    2017-11-13

    A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.

  9. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Directory of Open Access Journals (Sweden)

    Bendahmane Abdelhafid

    2011-05-01

    Full Text Available Abstract Background Melon (Cucumis melo, an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs and 3,073 single nucleotide polymorphisms (SNPs in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but

  10. Expressed sequence tags from heat-shocked seagrass Zostera noltii (Hornemann) from its southern distribution range.

    Science.gov (United States)

    Massa, Sónia I; Pearson, Gareth A; Aires, Tânia; Kube, Michael; Olsen, Jeanine L; Reinhardt, Richard; Serrão, Ester A; Arnaud-Haond, Sophie

    2011-09-01

    Predicted global climate change threatens the distributional ranges of species worldwide. We identified genes expressed in the intertidal seagrass Zostera noltii during recovery from a simulated low tide heat-shock exposure. Five Expressed Sequence Tag (EST) libraries were compared, corresponding to four recovery times following sub-lethal temperature stress, and a non-stressed control. We sequenced and analyzed 7009 sequence reads from 30min, 2h, 4h and 24h after the beginning of the heat-shock (AHS), and 1585 from the control library, for a total of 8594 sequence reads. Among 51 Tentative UniGenes (TUGs) exhibiting significantly different expression between libraries, 19 (37.3%) were identified as 'molecular chaperones' and were over-expressed following heat-shock, while 12 (23.5%) were 'photosynthesis TUGs' generally under-expressed in heat-shocked plants. A time course analysis of expression showed a rapid increase in expression of the molecular chaperone class, most of which were heat-shock proteins; which increased from 2 sequence reads in the control library to almost 230 in the 30min AHS library, followed by a slow decrease during further recovery. In contrast, 'photosynthesis TUGs' were under-expressed 30min AHS compared with the control library, and declined progressively with recovery time in the stress libraries, with a total of 29 sequence reads 24h AHS, compared with 125 in the control. A total of 4734 TUGs were screened for EST-Single Sequence Repeats (EST-SSRs) and 86 microsatellites were identified. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.

    Science.gov (United States)

    Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P

    2008-04-10

    Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.

  12. EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries

    Directory of Open Access Journals (Sweden)

    Pardinas Jose R

    2008-04-01

    Full Text Available Abstract Background Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. Results We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples produced by subtractive hybridization. Conclusion EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.

  13. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  14. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    Science.gov (United States)

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  15. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  16. BrAD-seq: Breath Adapter Directional sequencing: a streamlined, ultra-simple and fast library preparation protocol for strand specific mRNA library construction.

    Directory of Open Access Journals (Sweden)

    Brad Thomas Townsley

    2015-05-01

    Full Text Available Next Generation Sequencing (NGS is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing inherent properties of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE libraries and can easily extend to full transcript coverage shotgun (SHO type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.

  17. Droplet Digital™ PCR Next-Generation Sequencing Library QC Assay.

    Science.gov (United States)

    Heredia, Nicholas J

    2018-01-01

    Digital PCR is a valuable tool to quantify next-generation sequencing (NGS) libraries precisely and accurately. Accurately quantifying NGS libraries enable accurate loading of the libraries on to the sequencer and thus improve sequencing performance by reducing under and overloading error. Accurate quantification also benefits users by enabling uniform loading of indexed/barcoded libraries which in turn greatly improves sequencing uniformity of the indexed/barcoded samples. The advantages gained by employing the Droplet Digital PCR (ddPCR™) library QC assay includes the precise and accurate quantification in addition to size quality assessment, enabling users to QC their sequencing libraries with confidence.

  18. Large-scale Identification of Expressed Sequence Tags (ESTs from Nicotianatabacum by Normalized cDNA Library Sequencing

    Directory of Open Access Journals (Sweden)

    Alvarez S Perez

    2014-12-01

    Full Text Available An expressed sequence tags (EST resource for tobacco plants (Nicotianatabacum was established using high-throughput sequencing of randomly selected clones from one cDNA library representing a range of plant organs (leaf, stem, root and root base. Over 5000 ESTs were generated from the 3’ ends of 8000 clones, analyzed by BLAST searches and categorized functionally. All annotated ESTs were classified into 18 functional categories, unique transcripts involved in energy were the largest group accounting for 831 (32.32% of the annotated ESTs. After excluding 2450 non-significant tentative unique transcripts (TUTs, 100 unique sequences (1.67% of total TUTs were identified from the N. tabacum database. In the array result two genes strongly related to the tobacco mosaic virus (TMV were obtained, one basic form of pathogenesis-related protein 1 precursor (TBT012G08 and ubiquitin (TBT087G01. Both of them were found in the variety Hongda, some other important genes were classified into two groups, one of these implicated in plant development like those genes related to a photosynthetic process (chlorophyll a-b binding protein, photosystem I, ferredoxin I and III, ATP synthase and a further group including genes related to plant stress response (ubiquitin, ubiquitin-like protein SMT3, glycine-rich RNA binding protein, histones and methallothionein. The interesting finding in this study is that two of these genes have never been reported before in N. tabacum (ubiquitin-like protein SMT3 and methallothionein. The array results were confirmed using quantitative PCR.

  19. Differential representation of sunflower ESTs in enriched organ-specific cDNA libraries in a small scale sequencing project

    Directory of Open Access Journals (Sweden)

    Heinz Ruth A

    2003-09-01

    Full Text Available Abstract Background Subtractive hybridization methods are valuable tools for identifying differentially regulated genes in a given tissue avoiding redundant sequencing of clones representing the same expressed genes, maximizing detection of low abundant transcripts and thus, affecting the efficiency and cost effectiveness of small scale cDNA sequencing projects aimed to the specific identification of useful genes for breeding purposes. The objective of this work is to evaluate alternative strategies to high-throughput sequencing projects for the identification of novel genes differentially expressed in sunflower as a source of organ-specific genetic markers that can be functionally associated to important traits. Results Differential organ-specific ESTs were generated from leaf, stem, root and flower bud at two developmental stages (R1 and R4. The use of different sources of RNA as tester and driver cDNA for the construction of differential libraries was evaluated as a tool for detection of rare or low abundant transcripts. Organ-specificity ranged from 75 to 100% of non-redundant sequences in the different cDNA libraries. Sequence redundancy varied according to the target and driver cDNA used in each case. The R4 flower cDNA library was the less redundant library with 62% of unique sequences. Out of a total of 919 sequences that were edited and annotated, 318 were non-redundant sequences. Comparison against sequences in public databases showed that 60% of non-redundant sequences showed significant similarity to known sequences. The number of predicted novel genes varied among the different cDNA libraries, ranging from 56% in the R4 flower to 16 % in the R1 flower bud library. Comparison with sunflower ESTs on public databases showed that 197 of non-redundant sequences (60% did not exhibit significant similarity to previously reported sunflower ESTs. This approach helped to successfully isolate a significant number of new reported sequences

  20. Compositional Bias in Naïve and Chemically-modified Phage-Displayed Libraries uncovered by Paired-end Deep Sequencing.

    Science.gov (United States)

    He, Bifang; Tjhung, Katrina F; Bennett, Nicholas J; Chou, Ying; Rau, Andrea; Huang, Jian; Derda, Ratmir

    2018-01-19

    Understanding the composition of a genetically-encoded (GE) library is instrumental to the success of ligand discovery. In this manuscript, we investigate the bias in GE-libraries of linear, macrocyclic and chemically post-translationally modified (cPTM) tetrapeptides displayed on the M13KE platform, which are produced via trinucleotide cassette synthesis (19 codons) and NNK-randomized codon. Differential enrichment of synthetic DNA {S}, ligated vector {L} (extension and ligation of synthetic DNA into the vector), naïve libraries {N} (transformation of the ligated vector into the bacteria followed by expression of the library for 4.5 hours to yield a "naïve" library), and libraries chemically modified by aldehyde ligation and cysteine macrocyclization {M} characterized by paired-end deep sequencing, detected a significant drop in diversity in {L} → {N}, but only a minor compositional difference in {S} → {L} and {N} → {M}. Libraries expressed at the N-terminus of phage protein pIII censored positively charged amino acids Arg and Lys; libraries expressed between pIII domains N1 and N2 overcame Arg/Lys-censorship but introduced new bias towards Gly and Ser. Interrogation of biases arising from cPTM by aldehyde ligation and cysteine macrocyclization unveiled censorship of sequences with Ser/Phe. Analogous analysis can be used to explore library diversity in new display platforms and optimize cPTM of these libraries.

  1. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  2. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    Science.gov (United States)

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  3. End Sequencing and Finger Printing of Human & Mouse BAC Libraries

    Energy Technology Data Exchange (ETDEWEB)

    Fraser, C

    2005-09-27

    This project provided for continued end sequencing of existing and new BAC libraries constructed to support human sequencing as well as to initiate BAC end sequencing from the mouse BAC libraries constructed to support mouse sequencing. The clones, the sequences, and the fingerprints are now an available resource for the community at large. Research and development of new metaodologies for BAC end sequencing have reduced costs and increase throughput.

  4. A microfluidic DNA library preparation platform for next-generation sequencing.

    Science.gov (United States)

    Kim, Hanyoup; Jebrail, Mais J; Sinha, Anupama; Bent, Zachary W; Solberg, Owen D; Williams, Kelly P; Langevin, Stanley A; Renzi, Ronald F; Van De Vreugde, James L; Meagher, Robert J; Schoeniger, Joseph S; Lane, Todd W; Branda, Steven S; Bartsch, Michael S; Patel, Kamlesh D

    2013-01-01

    Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.

  5. A microfluidic DNA library preparation platform for next-generation sequencing.

    Directory of Open Access Journals (Sweden)

    Hanyoup Kim

    Full Text Available Next-generation sequencing (NGS is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM. The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.

  6. Pairwise Sequence Alignment Library

    Energy Technology Data Exchange (ETDEWEB)

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  7. A Synthetic Oligo Library and Sequencing Approach Reveals an Insulation Mechanism Encoded within Bacterial σ54 Promoters

    Directory of Open Access Journals (Sweden)

    Lior Levy

    2017-10-01

    Full Text Available We use an oligonucleotide library of >10,000 variants to identify an insulation mechanism encoded within a subset of σ54 promoters. Insulation manifests itself as reduced protein expression for a downstream gene that is expressed by transcriptional readthrough. It is strongly associated with the presence of short CT-rich motifs (3–5 bp, positioned within 25 bp upstream of the Shine-Dalgarno (SD motif of the silenced gene. We provide evidence that insulation is triggered by binding of the ribosome binding site (RBS to the upstream CT-rich motif. We also show that, in E. coli, insulator sequences are preferentially encoded within σ54 promoters, suggesting an important regulatory role for these sequences in natural contexts. Our findings imply that sequence-specific regulatory effects that are sparsely encoded by short motifs may not be easily detected by lower throughput studies. Such sequence-specific phenomena can be uncovered with a focused oligo library (OL design that mitigates sequence-related variance, as exemplified herein.

  8. An expressed sequence tag (EST) library for Drosophila serrata, a model system for sexual selection and climatic adaptation studies.

    Science.gov (United States)

    Frentiu, Francesca D; Adamski, Marcin; McGraw, Elizabeth A; Blows, Mark W; Chenoweth, Stephen F

    2009-01-21

    The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup. A normalized cDNA library was constructed from whole fly bodies at several developmental stages, including larvae and adults. Assembly of 11,616 clones sequenced from the 3' end allowed us to identify 6,607 unique contigs, of which at least 90% encoded peptides. Partial transcripts were discovered from a variety of genes of evolutionary interest by BLASTing contigs against the 12 Drosophila genomes currently sequenced. By incorporating into the cDNA library multiple individuals from populations spanning a large portion of the geographical range of D. serrata, we were able to identify 11,057 putative single nucleotide polymorphisms (SNPs), with 278 different contigs having at least one "double hit" SNP that is highly likely to be a real polymorphism. At least 394 EST-associated microsatellite markers, representing 355 different contigs, were also found, providing an additional set of genetic markers. The assembled EST library is available online at http://www.chenowethlab.org/serrata/index.cgi. We have provided the first gene collection and largest set of polymorphic genetic markers, to date, for the fly D. serrata. The EST collection will provide much needed genomic resources for

  9. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  10. The function analysis of full-length cDNA sequence from IRM-2 mouse cDNA library

    International Nuclear Information System (INIS)

    Wang Qin; Liu Xiaoqiu; Xu Chang; Du Liqing; Sun Zhijuan; Wang Yan; Liu Qiang; Song Li; Li Jin; Fan Feiyue

    2013-01-01

    Objective: To identify the function of full-length cDNA sequence from IRM-2 mouse cDNA library. Methods: Full-length cDNA products were amplified by PCR from IRM-2 mouse cDNA library according to twenty-one pieces of expressed sequence tag. The expression of full-length cDNAs were detected after mouse embryonic fibroblasts were exposed to 6.5 Gy γ-ray radiation. And the effect on the growth of radiosensitivity cells AT5B1VA transfected with full-length cDNAs was investigated. Results: The expression of No.4, 5 and 2 full-length cDNAs from IRM-2 mouse were higher than that of parental ICR and 615 mouse after mouse embryonic fibroblasts irradiated with γ-ray radiation. And the survival rate of AT5B1VA cells transfected with No.4, 5 and 2 full-length cDNAs was high. Conclusion: No.4, 5 and 2 full-length cDNAs of IRM-2 mouse are of high radioresistance. (authors)

  11. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library.

    Science.gov (United States)

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes provides the necessary foundation for

  12. Robust Sub-nanomolar Library Preparation for High Throughput Next Generation Sequencing.

    Science.gov (United States)

    Wu, Wells W; Phue, Je-Nie; Lee, Chun-Ting; Lin, Changyi; Xu, Lai; Wang, Rong; Zhang, Yaqin; Shen, Rong-Fong

    2018-05-04

    Current library preparation protocols for Illumina HiSeq and MiSeq DNA sequencers require ≥2 nM initial library for subsequent loading of denatured cDNA onto flow cells. Such amounts are not always attainable from samples having a relatively low DNA or RNA input; or those for which a limited number of PCR amplification cycles is preferred (less PCR bias and/or more even coverage). A well-tested sub-nanomolar library preparation protocol for Illumina sequencers has however not been reported. The aim of this study is to provide a much needed working protocol for sub-nanomolar libraries to achieve outcomes as informative as those obtained with the higher library input (≥ 2 nM) recommended by Illumina's protocols. Extensive studies were conducted to validate a robust sub-nanomolar (initial library of 100 pM) protocol using PhiX DNA (as a control), genomic DNA (Bordetella bronchiseptica and microbial mock community B for 16S rRNA gene sequencing), messenger RNA, microRNA, and other small noncoding RNA samples. The utility of our protocol was further explored for PhiX library concentrations as low as 25 pM, which generated only slightly fewer than 50% of the reads achieved under the standard Illumina protocol starting with > 2 nM. A sub-nanomolar library preparation protocol (100 pM) could generate next generation sequencing (NGS) results as robust as the standard Illumina protocol. Following the sub-nanomolar protocol, libraries with initial concentrations as low as 25 pM could also be sequenced to yield satisfactory and reproducible sequencing results.

  13. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

    Science.gov (United States)

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-01-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376

  14. Expressed sequence tag-derived microsatellite markers of perennial ryegrass (Lolium perenne L.)

    DEFF Research Database (Denmark)

    Studer, Bruno; Asp, Torben; Frei, Ursula

    2008-01-01

    An expressed sequence tag (EST) library of the key grassland species perennial ryegrass (Lolium perenne L.) has been exploited as a resource for microsatellite marker development. Out of 955 simple sequence repeat (SSR) containing ESTs, 744 were used for primer design. Primer amplification was te...

  15. Balancing gene expression without library construction via a reusable sRNA pool.

    Science.gov (United States)

    Ghodasara, Amar; Voigt, Christopher A

    2017-07-27

    Balancing protein expression is critical when optimizing genetic systems. Typically, this requires library construction to vary the genetic parts controlling each gene, which can be expensive and time-consuming. Here, we develop sRNAs corresponding to 15nt 'target' sequences that can be inserted upstream of a gene. The targeted gene can be repressed from 1.6- to 87-fold by controlling sRNA expression using promoters of different strength. A pool is built where six sRNAs are placed under the control of 16 promoters that span a ∼103-fold range of strengths, yielding ∼107 combinations. This pool can simultaneously optimize up to six genes in a system. This requires building only a single system-specific construct by placing a target sequence upstream of each gene and transforming it with the pre-built sRNA pool. The resulting library is screened and the top clone is sequenced to determine the promoter controlling each sRNA, from which the fold-repression of the genes can be inferred. The system is then rebuilt by rationally selecting parts that implement the optimal expression of each gene. We demonstrate the versatility of this approach by using the same pool to optimize a metabolic pathway (β-carotene) and genetic circuit (XNOR logic gate). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Quantitation of next generation sequencing library preparation protocol efficiencies using droplet digital PCR assays - a systematic comparison of DNA library preparation kits for Illumina sequencing.

    Science.gov (United States)

    Aigrain, Louise; Gu, Yong; Quail, Michael A

    2016-06-13

    The emergence of next-generation sequencing (NGS) technologies in the past decade has allowed the democratization of DNA sequencing both in terms of price per sequenced bases and ease to produce DNA libraries. When it comes to preparing DNA sequencing libraries for Illumina, the current market leader, a plethora of kits are available and it can be difficult for the users to determine which kit is the most appropriate and efficient for their applications; the main concerns being not only cost but also minimal bias, yield and time efficiency. We compared 9 commercially available library preparation kits in a systematic manner using the same DNA sample by probing the amount of DNA remaining after each protocol steps using a new droplet digital PCR (ddPCR) assay. This method allows the precise quantification of fragments bearing either adaptors or P5/P7 sequences on both ends just after ligation or PCR enrichment. We also investigated the potential influence of DNA input and DNA fragment size on the final library preparation efficiency. The overall library preparations efficiencies of the libraries show important variations between the different kits with the ones combining several steps into a single one exhibiting some final yields 4 to 7 times higher than the other kits. Detailed ddPCR data also reveal that the adaptor ligation yield itself varies by more than a factor of 10 between kits, certain ligation efficiencies being so low that it could impair the original library complexity and impoverish the sequencing results. When a PCR enrichment step is necessary, lower adaptor-ligated DNA inputs leads to greater amplification yields, hiding the latent disparity between kits. We describe a ddPCR assay that allows us to probe the efficiency of the most critical step in the library preparation, ligation, and to draw conclusion on which kits is more likely to preserve the sample heterogeneity and reduce the need of amplification.

  17. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  18. Comparative analysis of expressed sequence tags from three castes and two life stages of the termite Reticulitermes flavipes

    Directory of Open Access Journals (Sweden)

    Steller Matthew M

    2010-08-01

    Full Text Available Abstract Background Termites (Isoptera are eusocial insects whose colonies consist of morphologically and behaviorally specialized castes of sterile workers and soldiers, and reproductive alates. Previous studies on eusocial insects have indicated that caste differentiation and behavior are underlain by differential gene expression. Although much is known about gene expression in the honey bee, Apis mellifera, termites remain relatively understudied in this regard. Therefore, our objective was to assemble an expressed sequence tag (EST data base for the eastern subterranean termite, Reticulitermes flavipes, for future gene expression studies. Results Soldier, worker, and alate caste and two larval cDNA libraries were constructed, and approximately 15,000 randomly chosen clones were sequenced to compile an EST data base. Putative gene functions were assigned based on a BLASTX Swissprot search. Categorical in silico expression patterns for each library were compared using the R-statistic. A significant proportion of the ESTs of each caste and life stages had no significant similarity to those in existing data bases. All cDNA libraries, including those of non-reproductive worker and soldier castes, contained sequences with putative reproductive functions. Genes that showed a potential expression bias among castes included a putative antibacterial humoral response and translation elongation protein in soldiers and a chemosensory protein in alates. Conclusions We have expanded upon the available sequences for R. flavipes and utilized an in silico method to compare gene expression in different castes of an eusocial insect. The in silico analysis allowed us to identify several genes which may be differentially expressed and involved in caste differences. These include a gene overrepresented in the alate cDNA library with a predicted function of neurotransmitter secretion or cholesterol absorption and a gene predicted to be involved in protein

  19. Expressed sequence tags from heat-shocked seagrass Zostera noltii (Hornemann) from its southern distribution range

    NARCIS (Netherlands)

    Massa, Sonia I.; Pearson, Gareth A.; Aires, Tania; Kube, Michael; Olsen, Jeanine L.; Reinhardt, Richard; Serrao, Ester A.; Arnaud-Haond, Sophie

    Predicted global climate change threatens the distributional ranges of species worldwide. We identified genes expressed in the intertidal seagrass Zostera midi during recovery from a simulated low tide heat-shock exposure. Five Expressed Sequence Tag (EST) libraries were compared, corresponding to

  20. PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.

    Science.gov (United States)

    Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred

    2018-01-01

    The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.

  1. Construction of an Americn mink Bacterial Artificial Chromosome (BAC) library and sequencing candidate genes important for the fur industry

    DEFF Research Database (Denmark)

    Anistoroaei, Razvan Marian; Hallers, Boudewijn ten; Nefedov, Michael

    2011-01-01

    BACKGROUND: Bacterial artificial chromosome (BAC) libraries continue to be invaluable tools for the genomic analysis of complex organisms. Complemented by the newly and fast growing deep sequencing technologies, they provide an excellent source of information in genomics projects. RESULTS: Here, we...... report the construction and characterization of the CHORI-231 BAC library constructed from a Danish-farmed, male American mink (Neovison vison). The library contains approximately 165,888 clones with an average insert size of 170 kb, representing approximately 10-fold coverage. High-density filters, each...... consisting of 18,432 clones spotted in duplicate, have been produced for hybridization screening and are publicly available. Overgo probes derived from expressed sequence tags (ESTs), representing 21 candidate genes for traits important for the mink industry, were used to screen the BAC library...

  2. Analysis of a normalised expressed sequence tag (EST) library from a key pollinator, the bumblebee Bombus terrestris.

    Science.gov (United States)

    Sadd, Ben M; Kube, Michael; Klages, Sven; Reinhardt, Richard; Schmid-Hempel, Paul

    2010-02-15

    The bumblebee, Bombus terrestris (Order Hymenoptera), is of widespread importance. This species is extensively used for commercial pollination in Europe, and along with other Bombus spp. is a key member of natural pollinator assemblages. Furthermore, the species is studied in a wide variety of biological fields. The objective of this project was to create a B. terrestris EST resource that will prove to be valuable in obtaining a deeper understanding of this significant social insect. A normalised cDNA library was constructed from the thorax and abdomen of B. terrestris workers in order to enhance the discovery of rare genes. A total of 29'428 ESTs were sequenced. Subsequent clustering resulted in 13'333 unique sequences. Of these, 58.8 percent had significant similarities to known proteins, with 54.5 percent having a "best-hit" to existing Hymenoptera sequences. Comparisons with the honeybee and other insects allowed the identification of potential candidates for gene loss, pseudogene evolution, and possible incomplete annotation in the honeybee genome. Further, given the focus of much basic research and the perceived threat of disease to natural and commercial populations, the immune system of bumblebees is a particularly relevant component. Although the library is derived from unchallenged bees, we still uncover transcription of a number of immune genes spanning the principally described insect immune pathways. Additionally, the EST library provides a resource for the discovery of genetic markers that can be used in population level studies. Indeed, initial screens identified 589 simple sequence repeats and 854 potential single nucleotide polymorphisms. The resource that these B. terrestris ESTs represent is valuable for ongoing work. The ESTs provide direct evidence of transcriptionally active regions, but they will also facilitate further functional genomics, gene discovery and future genome annotation. These are important aspects in obtaining a greater

  3. Next-generation sequencing library preparation method for identification of RNA viruses on the Ion Torrent Sequencing Platform.

    Science.gov (United States)

    Chen, Guiqian; Qiu, Yuan; Zhuang, Qingye; Wang, Suchun; Wang, Tong; Chen, Jiming; Wang, Kaicheng

    2018-05-09

    Next generation sequencing (NGS) is a powerful tool for the characterization, discovery, and molecular identification of RNA viruses. There were multiple NGS library preparation methods published for strand-specific RNA-seq, but some methods are not suitable for identifying and characterizing RNA viruses. In this study, we report a NGS library preparation method to identify RNA viruses using the Ion Torrent PGM platform. The NGS sequencing adapters were directly inserted into the sequencing library through reverse transcription and polymerase chain reaction, without fragmentation and ligation of nucleic acids. The results show that this method is simple to perform, able to identify multiple species of RNA viruses in clinical samples.

  4. An Automated Pipeline for Engineering Many-Enzyme Pathways: Computational Sequence Design, Pathway Expression-Flux Mapping, and Scalable Pathway Optimization.

    Science.gov (United States)

    Halper, Sean M; Cetnar, Daniel P; Salis, Howard M

    2018-01-01

    Engineering many-enzyme metabolic pathways suffers from the design curse of dimensionality. There are an astronomical number of synonymous DNA sequence choices, though relatively few will express an evolutionary robust, maximally productive pathway without metabolic bottlenecks. To solve this challenge, we have developed an integrated, automated computational-experimental pipeline that identifies a pathway's optimal DNA sequence without high-throughput screening or many cycles of design-build-test. The first step applies our Operon Calculator algorithm to design a host-specific evolutionary robust bacterial operon sequence with maximally tunable enzyme expression levels. The second step applies our RBS Library Calculator algorithm to systematically vary enzyme expression levels with the smallest-sized library. After characterizing a small number of constructed pathway variants, measurements are supplied to our Pathway Map Calculator algorithm, which then parameterizes a kinetic metabolic model that ultimately predicts the pathway's optimal enzyme expression levels and DNA sequences. Altogether, our algorithms provide the ability to efficiently map the pathway's sequence-expression-activity space and predict DNA sequences with desired metabolic fluxes. Here, we provide a step-by-step guide to applying the Pathway Optimization Pipeline on a desired multi-enzyme pathway in a bacterial host.

  5. Hybrid sequencing approach applied to human fecal metagenomic clone libraries revealed clones with potential biotechnological applications.

    Science.gov (United States)

    Džunková, Mária; D'Auria, Giuseppe; Pérez-Villarroya, David; Moya, Andrés

    2012-01-01

    Natural environments represent an incredible source of microbial genetic diversity. Discovery of novel biomolecules involves biotechnological methods that often require the design and implementation of biochemical assays to screen clone libraries. However, when an assay is applied to thousands of clones, one may eventually end up with very few positive clones which, in most of the cases, have to be "domesticated" for downstream characterization and application, and this makes screening both laborious and expensive. The negative clones, which are not considered by the selected assay, may also have biotechnological potential; however, unfortunately they would remain unexplored. Knowledge of the clone sequences provides important clues about potential biotechnological application of the clones in the library; however, the sequencing of clones one-by-one would be very time-consuming and expensive. In this study, we characterized the first metagenomic clone library from the feces of a healthy human volunteer, using a method based on 454 pyrosequencing coupled with a clone-by-clone Sanger end-sequencing. Instead of whole individual clone sequencing, we sequenced 358 clones in a pool. The medium-large insert (7-15 kb) cloning strategy allowed us to assemble these clones correctly, and to assign the clone ends to maintain the link between the position of a living clone in the library and the annotated contig from the 454 assembly. Finally, we found several open reading frames (ORFs) with previously described potential medical application. The proposed approach allows planning ad-hoc biochemical assays for the clones of interest, and the appropriate sub-cloning strategy for gene expression in suitable vectors/hosts.

  6. Generation and analysis of large-scale expressed sequence tags (ESTs from a full-length enriched cDNA library of porcine backfat tissue

    Directory of Open Access Journals (Sweden)

    Lee Hae-Young

    2006-02-01

    Full Text Available Abstract Background Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. Results We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761. For all the expressed sequence tags (ESTs, approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp. Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46% and 3,232 singleton (65.54% ESTs. From a total of 5,008 unique sequences, 3,154 (62.98% were similar to other sequences, and 1,854 (37.02% were identified as having no hit or low identity (Sus scrofa. Gene ontology (GO annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64% and a small proportion of contigs (13.36%. Conclusion The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences. The isolation of genes expressed in backfat tissue is the

  7. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    Science.gov (United States)

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  8. Simultaneous digital quantification and fluorescence-based size characterization of massively parallel sequencing libraries.

    Science.gov (United States)

    Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H

    2013-08-01

    Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.

  9. Next-generation sequencing of multiple individuals per barcoded library by deconvolution of sequenced amplicons using endonuclease fragment analysis

    DEFF Research Database (Denmark)

    Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta

    2014-01-01

    The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease...... digestion of PCR amplicons prior to library preparation, creating a specific fragment pattern for each individual that can be resolved after sequencing. By using both barcodes and restriction fragment patterns, we demonstrate the ability to sequence the human melanocortin 1 receptor (MC1R) genes from 72...... individuals using only 24 barcoded libraries....

  10. Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jacob

    2007-01-01

    public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. RESULTS: Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which...... with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression...

  11. Identification of immune protective genes of Eimeria maxima through cDNA expression library screening.

    Science.gov (United States)

    Yang, XinChao; Li, MengHui; Liu, JianHua; Ji, YiHong; Li, XiangRui; Xu, LiXin; Yan, RuoFeng; Song, XiaoKai

    2017-02-16

    Eimeria maxima is one of the most prevalent Eimeria species causing avian coccidiosis, and results in huge economic loss to the global poultry industry. Current control strategies, such as anti-coccidial medication and live vaccines have been limited because of their drawbacks. The third generation anticoccidial vaccines including the recombinant vaccines as well as DNA vaccines have been suggested as a promising alternative strategy. To date, only a few protective antigens of E. maxima have been reported. Hence, there is an urgent need to identify novel protective antigens of E. maxima for the development of neotype anticoccidial vaccines. With the aim of identifying novel protective genes of E. maxima, a cDNA expression library of E. maxima sporozoites was constructed using Gateway technology. Subsequently, the cDNA expression library was divided into 15 sub-libraries for cDNA expression library immunization (cDELI) using parasite challenged model in chickens. Protective sub-libraries were selected for the next round of screening until individual protective clones were obtained, which were further sequenced and analyzed. Adopting the Gateway technology, a high-quality entry library was constructed, containing 9.2 × 10 6 clones with an average inserted fragments length of 1.63 kb. The expression library capacity was 2.32 × 10 7 colony-forming units (cfu) with an average inserted fragments length of 1.64 Kb. The expression library was screened using parasite challenged model in chickens. The screening yielded 6 immune protective genes including four novel protective genes of EmJS-1, EmRP, EmHP-1 and EmHP-2, and two known protective genes of EmSAG and EmCKRS. EmJS-1 is the selR domain-containing protein of E. maxima whose function is unknown. EmHP-1 and EmHP-2 are the hypothetical proteins of E. maxima. EmRP and EmSAG are rhomboid-like protein and surface antigen glycoproteins of E. maxima respectively, and involved in invasion of the parasite. Our

  12. Mining olive genome through library sequencing and bioinformatics ...

    African Journals Online (AJOL)

    As one of the initial steps of olive (Olea europaea L.) genome analysis, a small insert genomic DNA library was constructed (digesting olive genomic DNA with SmaI and cloning the digestion products into pUC19 vector) and randomly picked 83 colonies were sequenced. Analysis of the insert sequences revealed 12 clones ...

  13. Hybrid sequencing approach applied to human fecal metagenomic clone libraries revealed clones with potential biotechnological applications.

    Directory of Open Access Journals (Sweden)

    Mária Džunková

    Full Text Available Natural environments represent an incredible source of microbial genetic diversity. Discovery of novel biomolecules involves biotechnological methods that often require the design and implementation of biochemical assays to screen clone libraries. However, when an assay is applied to thousands of clones, one may eventually end up with very few positive clones which, in most of the cases, have to be "domesticated" for downstream characterization and application, and this makes screening both laborious and expensive. The negative clones, which are not considered by the selected assay, may also have biotechnological potential; however, unfortunately they would remain unexplored. Knowledge of the clone sequences provides important clues about potential biotechnological application of the clones in the library; however, the sequencing of clones one-by-one would be very time-consuming and expensive. In this study, we characterized the first metagenomic clone library from the feces of a healthy human volunteer, using a method based on 454 pyrosequencing coupled with a clone-by-clone Sanger end-sequencing. Instead of whole individual clone sequencing, we sequenced 358 clones in a pool. The medium-large insert (7-15 kb cloning strategy allowed us to assemble these clones correctly, and to assign the clone ends to maintain the link between the position of a living clone in the library and the annotated contig from the 454 assembly. Finally, we found several open reading frames (ORFs with previously described potential medical application. The proposed approach allows planning ad-hoc biochemical assays for the clones of interest, and the appropriate sub-cloning strategy for gene expression in suitable vectors/hosts.

  14. Comparative Genomics in Switchgrass Using 61,585 High-Quality Expressed Sequence Tags

    Directory of Open Access Journals (Sweden)

    Christian M. Tobias

    2008-11-01

    Full Text Available The development of genomic resources for switchgrass ( L., a perennial NAD-malic enzyme type C grass, is required to enable molecular breeding and biotechnological approaches for improving its value as a forage and bioenergy crop. Expressed sequence tag (EST sequencing is one method that can quickly sample gene inventories and produce data suitable for marker development or analysis of tissue-specific patterns of expression. Toward this goal, three cDNA libraries from callus, crown, and seedling tissues of ‘Kanlow’ switchgrass were end-sequenced to generate a total of 61,585 high-quality ESTs from 36,565 separate clones. Seventy-three percent of the assembled consensus sequences could be aligned with the sorghum [ (L. Moench] genome at a -value of <1 × 10, indicating a high degree of similarity. Sixty-five percent of the ESTs matched with gene ontology molecular terms, and 3.3% of the sequences were matched with genes that play potential roles in cell-wall biogenesis. The representation in the three libraries of gene families known to be associated with C photosynthesis, cellulose and β-glucan synthesis, phenylpropanoid biosynthesis, and peroxidase activity indicated likely roles for individual family members. Pairwise comparisons of synonymous codon substitutions were used to assess genome sequence diversity and indicated an overall similarity between the two genome copies present in the tetraploid. Identification of EST–simple sequence repeat markers and amplification on two individual parents of a mapping population yielded an average of 2.18 amplicons per individual, and 35% of the markers produced fragment length polymorphisms.

  15. Construction of an American mink Bacterial Artificial Chromosome (BAC library and sequencing candidate genes important for the fur industry

    Directory of Open Access Journals (Sweden)

    Christensen Knud

    2011-07-01

    Full Text Available Abstract Background Bacterial artificial chromosome (BAC libraries continue to be invaluable tools for the genomic analysis of complex organisms. Complemented by the newly and fast growing deep sequencing technologies, they provide an excellent source of information in genomics projects. Results Here, we report the construction and characterization of the CHORI-231 BAC library constructed from a Danish-farmed, male American mink (Neovison vison. The library contains approximately 165,888 clones with an average insert size of 170 kb, representing approximately 10-fold coverage. High-density filters, each consisting of 18,432 clones spotted in duplicate, have been produced for hybridization screening and are publicly available. Overgo probes derived from expressed sequence tags (ESTs, representing 21 candidate genes for traits important for the mink industry, were used to screen the BAC library. These included candidate genes for coat coloring, hair growth and length, coarseness, and some receptors potentially involved in viral diseases in mink. The extensive screening yielded positive results for 19 of these genes. Thirty-five clones corresponding to 19 genes were sequenced using 454 Roche, and large contigs (184 kb in average were assembled. Knowing the complete sequences of these candidate genes will enable confirmation of the association with a phenotype and the finding of causative mutations for the targeted phenotypes. Additionally, 1577 BAC clones were end sequenced; 2505 BAC end sequences (80% of BACs were obtained. An excess of 2 Mb has been analyzed, thus giving a snapshot of the mink genome. Conclusions The availability of the CHORI-321 American mink BAC library will aid in identification of genes and genomic regions of interest. We have demonstrated how the library can be used to identify specific genes of interest, develop genetic markers, and for BAC end sequencing and deep sequencing of selected clones. To our knowledge, this is the

  16. Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don

    Directory of Open Access Journals (Sweden)

    Dillon Shannon K

    2009-01-01

    Full Text Available Abstract Background Wood is a major renewable natural resource for the timber, fibre and bioenergy industry. Pinus radiata D. Don is the most important commercial plantation tree species in Australia and several other countries; however, genomic resources for this species are very limited in public databases. Our primary objective was to sequence a large number of expressed sequence tags (ESTs from genes involved in wood formation in radiata pine. Results Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues sampled at juvenile (7 yrs, transition (11 yrs and mature (30 yrs ages, respectively. These xylem tissues represent six typical development stages in a rotation period of radiata pine. A total of 6,389 high quality ESTs were collected from 5,952 cDNA clones. Assembly of 5,952 ESTs from 5' end sequences generated 3,304 unigenes including 952 contigs and 2,352 singletons. About 97.0% of the 5,952 ESTs and 96.1% of the unigenes have matches in the UniProt and TIGR databases. Of the 3,174 unigenes with matches, 42.9% were not assigned GO (Gene Ontology terms and their functions are unknown or unclassified. More than half (52.1% of the 5,952 ESTs have matches in the Pfam database and represent 772 known protein families. About 18.0% of the 5,952 ESTs matched cell wall related genes in the MAIZEWALL database, representing all 18 categories, 91 of all 174 families and possibly 557 genes. Fifteen cell wall-related genes are ranked in the 30 most abundant genes, including CesA, tubulin, AGP, SAMS, actin, laccase, CCoAMT, MetE, phytocyanin, pectate lyase, cellulase, SuSy, expansin, chitinase and UDP-glucose dehydrogenase. Based on the PlantTFDB database 41 of the 64 transcription factor families in the poplar genome were identified as being involved in radiata pine wood formation. Comparative analysis of GO term abundance revealed a distinct transcriptome in juvenile earlywood formation compared to other stages of

  17. A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries.

    Science.gov (United States)

    Asamizu, E; Nakamura, Y; Sato, S; Tabata, S

    2000-06-30

    For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.

  18. Rapid in silico cloning of genes using expressed sequence tags (ESTs).

    Science.gov (United States)

    Gill, R W; Sanseau, P

    2000-01-01

    Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.

  19. Progress in strategies for sequence diversity library creation for ...

    African Journals Online (AJOL)

    As the simplest technique of protein engineering, directed evolution has been ... An experiment of directed evolution comprises mutant libraries creation and ... evolution, sequence diversity creation, novel strategy, computational design, ...

  20. Flavonoid Biosynthesis Genes Putatively Identified in the Aromatic Plant Polygonum minus via Expressed Sequences Tag (EST Analysis

    Directory of Open Access Journals (Sweden)

    Zamri Zainal

    2012-02-01

    Full Text Available P. minus is an aromatic plant, the leaf of which is widely used as a food additive and in the perfume industry. The leaf also accumulates secondary metabolites that act as active ingredients such as flavonoid. Due to limited genomic and transcriptomic data, the biosynthetic pathway of flavonoids is currently unclear. Identification of candidate genes involved in the flavonoid biosynthetic pathway will significantly contribute to understanding the biosynthesis of active compounds. We have constructed a standard cDNA library from P. minus leaves, and two normalized full-length enriched cDNA libraries were constructed from stem and root organs in order to create a gene resource for the biosynthesis of secondary metabolites, especially flavonoid biosynthesis. Thus, large‑scale sequencing of P. minus cDNA libraries identified 4196 expressed sequences tags (ESTs which were deposited in dbEST in the National Center of Biotechnology Information (NCBI. From the three constructed cDNA libraries, 11 ESTs encoding seven genes were mapped to the flavonoid biosynthetic pathway. Finally, three flavonoid biosynthetic pathway-related ESTs chalcone synthase, CHS (JG745304, flavonol synthase, FLS (JG705819 and leucoanthocyanidin dioxygenase, LDOX (JG745247 were selected for further examination by quantitative RT-PCR (qRT-PCR in different P. minus organs. Expression was detected in leaf, stem and root. Gene expression studies have been initiated in order to better understand the underlying physiological processes.

  1. Synthetic promoter libraries for Corynebacterium glutamicum

    DEFF Research Database (Denmark)

    Rytter, Jakob Vang; Helmark, Søren; Chen, Jun

    2014-01-01

    The ability to modulate gene expression is an important genetic tool in systems biology and biotechnology. Here, we demonstrate that a previously published easy and fast PCR-based method for modulating gene expression in lactic acid bacteria is also applicable to Corynebacterium glutamicum. We co...... promoter library (SPL) technology is convenient for modulating gene expression in C. glutamicum and should have many future applications, within basic research as well as for optimizing industrial production organisms....... constructed constitutive promoter libraries based on various combinations of a previously reported C. glutamicum -10 consensus sequence (gngnTA(c/t)aaTgg) and the Escherichia coli -35 consensus, either with or without an AT-rich region upstream. A promoter library based on consensus sequences frequently found...... in low-GC Gram-positive microorganisms was also included. The strongest promoters were found in the library with a -35 region and a C. glutamicum -10 consensus, and this library also represents the largest activity span. Using the alternative -10 consensus TATAAT, which can be found in many other...

  2. Deep-sequencing protocols influence the results obtained in small-RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Joern Toedling

    Full Text Available Second-generation sequencing is a powerful method for identifying and quantifying small-RNA components of cells. However, little attention has been paid to the effects of the choice of sequencing platform and library preparation protocol on the results obtained. We present a thorough comparison of small-RNA sequencing libraries generated from the same embryonic stem cell lines, using different sequencing platforms, which represent the three major second-generation sequencing technologies, and protocols. We have analysed and compared the expression of microRNAs, as well as populations of small RNAs derived from repetitive elements. Despite the fact that different libraries display a good correlation between sequencing platforms, qualitative and quantitative variations in the results were found, depending on the protocol used. Thus, when comparing libraries from different biological samples, it is strongly recommended to use the same sequencing platform and protocol in order to ensure the biological relevance of the comparisons.

  3. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

    Science.gov (United States)

    Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

    2015-11-18

    RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as

  4. Next generation sequencing (NGS)technologies and applications

    Energy Technology Data Exchange (ETDEWEB)

    Vuyisich, Momchilo [Los Alamos National Laboratory

    2012-09-11

    NGS technology overview: (1) NGS library preparation - Nucleic acids extraction, Sample quality control, RNA conversion to cDNA, Addition of sequencing adapters, Quality control of library; (2) Sequencing - Clonal amplification of library fragments, (except PacBio), Sequencing by synthesis, Data output (reads and quality); and (3) Data analysis - Read mapping, Genome assembly, Gene expression, Operon structure, sRNA discovery, and Epigenetic analyses.

  5. Isolation of xylose isomerases by sequence- and function-based screening from a soil metagenomic library

    Directory of Open Access Journals (Sweden)

    Parachin Nádia

    2011-05-01

    Full Text Available Abstract Background Xylose isomerase (XI catalyses the isomerisation of xylose to xylulose in bacteria and some fungi. Currently, only a limited number of XI genes have been functionally expressed in Saccharomyces cerevisiae, the microorganism of choice for lignocellulosic ethanol production. The objective of the present study was to search for novel XI genes in the vastly diverse microbial habitat present in soil. As the exploitation of microbial diversity is impaired by the ability to cultivate soil microorganisms under standard laboratory conditions, a metagenomic approach, consisting of total DNA extraction from a given environment followed by cloning of DNA into suitable vectors, was undertaken. Results A soil metagenomic library was constructed and two screening methods based on protein sequence similarity and enzyme activity were investigated to isolate novel XI encoding genes. These two screening approaches identified the xym1 and xym2 genes, respectively. Sequence and phylogenetic analyses revealed that the genes shared 67% similarity and belonged to different bacterial groups. When xym1 and xym2 were overexpressed in a xylA-deficient Escherichia coli strain, similar growth rates to those in which the Piromyces XI gene was expressed were obtained. However, expression in S. cerevisiae resulted in only one-fourth the growth rate of that obtained for the strain expressing the Piromyces XI gene. Conclusions For the first time, the screening of a soil metagenomic library in E. coli resulted in the successful isolation of two active XIs. However, the discrepancy between XI enzyme performance in E. coli and S. cerevisiae suggests that future screening for XI activity from soil should be pursued directly using yeast as a host.

  6. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

    Science.gov (United States)

    Daily, Jeff

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

  7. Coupled high-throughput functional screening and next generation sequencing for identification of plant polymer decomposing enzymes in metagenomic libraries

    Directory of Open Access Journals (Sweden)

    Mari eNyyssönen

    2013-09-01

    Full Text Available Recent advances in sequencing technologies generate new predictions and hypotheses about the functional roles of environmental microorganisms. Yet, until we can test these predictions at a scale that matches our ability to generate them, most of them will remain as hypotheses. Function-based mining of metagenomic libraries can provide direct linkages between genes, metabolic traits and microbial taxa and thus bridge this gap between sequence data generation and functional predictions. Here we developed high-throughput screening assays for function-based characterization of activities involved in plant polymer decomposition from environmental metagenomic libraries. The multiplexed assays use fluorogenic and chromogenic substrates, combine automated liquid handling and use a genetically modified expression host to enable simultaneous screening of 12,160 clones for 14 activities in a total of 170,240 reactions. Using this platform we identified 374 (0.26 % cellulose, hemicellulose, chitin, starch, phosphate and protein hydrolyzing clones from fosmid libraries prepared from decomposing leaf litter. Sequencing on the Illumina MiSeq platform, followed by assembly and gene prediction of a subset of 95 fosmid clones, identified a broad range of bacterial phyla, including Actinobacteria, Bacteroidetes, multiple Proteobacteria sub-phyla in addition to some Fungi. Carbohydrate-active enzyme genes from 20 different glycoside hydrolase families were detected. Using tetranucleotide frequency binning of fosmid sequences, multiple enzyme activities from distinct fosmids were linked, demonstrating how biochemically-confirmed functional traits in environmental metagenomes may be attributed to groups of specific organisms. Overall, our results demonstrate how functional screening of metagenomic libraries can be used to connect microbial functionality to community composition and, as a result, complement large-scale metagenomic sequencing efforts.

  8. Quantification of massively parallel sequencing libraries - a comparative study of eight methods

    DEFF Research Database (Denmark)

    Hussing, Christian; Kampmann, Marie-Louise; Mogensen, Helle Smidt

    2018-01-01

    Quantification of massively parallel sequencing libraries is important for acquisition of monoclonal beads or clusters prior to clonal amplification and to avoid large variations in library coverage when multiple samples are included in one sequencing analysis. No gold standard for quantification...... estimates followed by Qubit and electrophoresis-based instruments (Bioanalyzer, TapeStation, GX Touch, and Fragment Analyzer), while SYBR Green and TaqMan based qPCR assays gave the lowest estimates. qPCR gave more accurate predictions of sequencing coverage than Qubit and TapeStation did. Costs, time......-consumption, workflow simplicity, and ability to quantify multiple samples are discussed. Technical specifications, advantages, and disadvantages of the various methods are pointed out....

  9. Expressed sequence tag analysis of adult human optic nerve for NEIBank: Identification of cell type and tissue markers

    Directory of Open Access Journals (Sweden)

    Peterson Katherine

    2009-09-01

    Full Text Available Abstract Background The optic nerve is a pure white matter central nervous system (CNS tract with an isolated blood supply, and is widely used in physiological studies of white matter response to various insults. We examined the gene expression profile of human optic nerve (ON and, through the NEIBANK online resource, to provide a resource of sequenced verified cDNA clones. An un-normalized cDNA library was constructed from pooled human ON tissues and was used in expressed sequence tag (EST analysis. Location of an abundant oligodendrocyte marker was examined by immunofluorescence. Quantitative real time polymerase chain reaction (qRT-PCR and Western analysis were used to compare levels of expression for key calcium channel protein genes and protein product in primate and rodent ON. Results Our analyses revealed a profile similar in many respects to other white matter related tissues, but significantly different from previously available ON cDNA libraries. The previous libraries were found to include specific markers for other eye tissues, suggesting contamination. Immune/inflammatory markers were abundant in the new ON library. The oligodendrocyte marker QKI was abundant at the EST level. Immunofluorescence revealed that this protein is a useful oligodendrocyte cell-type marker in rodent and primate ONs. L-type calcium channel EST abundance was found to be particularly low. A qRT-PCR-based comparative mammalian species analysis reveals that L-type calcium channel expression levels are significantly lower in primate than in rodent ON, which may help account for the class-specific difference in responsiveness to calcium channel blocking agents. Several known eye disease genes are abundantly expressed in ON. Many genes associated with normal axonal function, mRNAs associated with axonal transport, inflammation and neuroprotection are observed. Conclusion We conclude that the new cDNA library is a faithful representation of human ON and EST data

  10. Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

    Science.gov (United States)

    Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

    2006-01-01

    Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344

  11. Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.)

    Science.gov (United States)

    Background: Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed S...

  12. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Science.gov (United States)

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  13. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Directory of Open Access Journals (Sweden)

    Changqing Liu

    2013-05-01

    Full Text Available In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  14. Rapid and Easy Protocol for Quantification of Next-Generation Sequencing Libraries.

    Science.gov (United States)

    Hawkins, Steve F C; Guest, Paul C

    2018-01-01

    The emergence of next-generation sequencing (NGS) over the last 10 years has increased the efficiency of DNA sequencing in terms of speed, ease, and price. However, the exact quantification of a NGS library is crucial in order to obtain good data on sequencing platforms developed by the current market leader Illumina. Different approaches for DNA quantification are available currently and the most commonly used are based on analysis of the physical properties of the DNA through spectrophotometric or fluorometric methods. Although these methods are technically simple, they do not allow exact quantification as can be achieved using a real-time quantitative PCR (qPCR) approach. A qPCR protocol for DNA quantification with applications in NGS library preparation studies is presented here. This can be applied in various fields of study such as medical disorders resulting from nutritional programming disturbances.

  15. Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

    Science.gov (United States)

    Gillet-Markowska, Alexandre; Richard, Hugues; Fischer, Gilles; Lafontaine, Ingrid

    2015-03-15

    The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. cis sequence effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  17. Complementation of radiation-sensitive Ataxia telangiectasia cells after transfection of cDNA expression libraries and cosmid clones from wildtype cells

    International Nuclear Information System (INIS)

    Fritz, E.

    1994-06-01

    In this Ph.D.-thesis, phenotypic complementation of AT-cells (AT5BIVA) by transfection of cDNA-expression-libraries was adressed: After stable transfection of cDNA-expression-libraries G418 resistant clones were selected for enhanced radioresistance by a fractionated X-ray selection. One surviving transfectant clone (clone 514) exhibited enhanced radiation resistance in dose-response experiments and further X-ray selections. Cell cycle analysis revealed complementation of untreated and irradiated 514-cells in cell cycle progression. The rate of DNA synthesis, however, is not diminished after irradiation but shows the reverse effect. A transfected cDNA-fragment (AT500-cDNA) was isolated from the genomic DNA of 514-cells and proved to be an unknown DNA sequence. A homologous sequence could be detected in genomic DNA from human cell lines, but not in DNA from other species. The cDNA-sequence could be localized to human chromosome 11. In human cells the cDNA sequence is part of two large mRNAs. 4 different cosmid clones containing high molecular genomic DNA from normal human cells could be isolated from a library, each hybridizing to the AT500-cDNA. After stable transfection into AT-cells, one cosmid-clone was able to confer enhanced radiation resistance both in X-ray selections and dose-response experiments. The results indicate that the cloned cDNA-fragment is based on an unknown gene from human chromosome 11 which partially complements the radiosensitivity and the defective cell cycle progression in AT5BIVA cells. (orig.) [de

  18. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    Directory of Open Access Journals (Sweden)

    Arias Covadonga

    2007-06-01

    Full Text Available Abstract Background The ciliate protozoan Ichthyophthirius multifiliis (Ich is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate. Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan. BLASTX searches produced 2,518 significant (E-value -5 hits and further Gene Ontology (GO analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289. Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence.

  19. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    Directory of Open Access Journals (Sweden)

    Johanna Rhodes

    Full Text Available The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina, along with two new kits: the TruSeq Nano DNA kit (Illumina and the NEBNext Ultra DNA kit (New England Biolabs to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality being considered when ultimately deciding on which library prep method to use.

  20. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    Science.gov (United States)

    Rhodes, Johanna; Beale, Mathew A; Fisher, Matthew C

    2014-01-01

    The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use.

  1. Construction of a cDNA library from female adult of Toxocara canis, and analysis of EST and immune-related genes expressions.

    Science.gov (United States)

    Zhou, Rongqiong; Xia, Qingyou; Huang, Hancheng; Lai, Min; Wang, Zhenxin

    2011-10-01

    Toxocara canis is a widespread intestinal nematode parasite of dogs, which can also cause disease in humans. We employed an expressed sequence tag (EST) strategy in order to study gene-expression including development, digestion and reproduction of T. canis. ESTs provided a rapid way to identify genes, particularly in organisms for which we have very little molecular information. In this study, a cDNA library was constructed from a female adult of T. canis and 215 high-quality ESTs from 5'-ends of the cDNA clones representing 79 unigenes were obtained. The titer of the primary cDNA library was 1.83×10(6)pfu/mL with a recombination rate of 99.33%. Most of the sequences ranged from 300 to 900bp with an average length of 656bp. Cluster analysis of these ESTs allowed identification of 79 unique sequences containing 28 contigs and 51 singletons. BLASTX searches revealed that 18 unigenes (22.78% of the total) or 70 ESTs (32.56% of the total) were novel genes that had no significant matches to any protein sequences in the public databases. The rest of the 61 unigenes (77.22% of the total) or 145 ESTs (67.44% of the total) were closely matched to the known genes or sequences deposited in the public databases. These genes were classified into seven groups based on their known or putative biological functions. We also confirmed the gene expression patterns of several immune-related genes using RT-PCR examination. This work will provide a valuable resource for the further investigations in the stage-, sex- and tissue-specific gene transcription or expression. Copyright © 2011. Published by Elsevier Inc.

  2. Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA.

    Science.gov (United States)

    Marine, Rachel; Polson, Shawn W; Ravel, Jacques; Hatfull, Graham; Russell, Daniel; Sullivan, Matthew; Syed, Fraz; Dumas, Michael; Wommack, K Eric

    2011-11-01

    Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.

  3. Quality control of next-generation sequencing library through an integrative digital microfluidic platform.

    Science.gov (United States)

    Thaitrong, Numrin; Kim, Hanyoup; Renzi, Ronald F; Bartsch, Michael S; Meagher, Robert J; Patel, Kamlesh D

    2012-12-01

    We have developed an automated quality control (QC) platform for next-generation sequencing (NGS) library characterization by integrating a droplet-based digital microfluidic (DMF) system with a capillary-based reagent delivery unit and a quantitative CE module. Using an in-plane capillary-DMF interface, a prepared sample droplet was actuated into position between the ground electrode and the inlet of the separation capillary to complete the circuit for an electrokinetic injection. Using a DNA ladder as an internal standard, the CE module with a compact LIF detector was capable of detecting dsDNA in the range of 5-100 pg/μL, suitable for the amount of DNA required by the Illumina Genome Analyzer sequencing platform. This DMF-CE platform consumes tenfold less sample volume than the current Agilent BioAnalyzer QC technique, preserving precious sample while providing necessary sensitivity and accuracy for optimal sequencing performance. The ability of this microfluidic system to validate NGS library preparation was demonstrated by examining the effects of limited-cycle PCR amplification on the size distribution and the yield of Illumina-compatible libraries, demonstrating that as few as ten cycles of PCR bias the size distribution of the library toward undesirable larger fragments. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Crystal structure of importin-{alpha} complexed with a classic nuclear localization sequence obtained by oriented peptide library screening

    Energy Technology Data Exchange (ETDEWEB)

    Takeda, A.A.S.; Fontes, M.R.M. [UNESP, Universidade Estadual Paulista, Botucatu, SP (Brazil); Yang, S.N.Y. [University of Melbourne, Melbourne (Australia); Harris, J.M. [Queensland University of Technology, Brisbane (Australia); Jans, D.A. [Monash University, Clayton (Australia); Kobe, B. [University of Queensland, Brisbane, QU (Australia)

    2012-07-01

    Full text: Importin-{alpha} (Imp{alpha}) plays a role in the classical nuclear import pathway, binding to cargo proteins with activities in the nucleus. Different Imp{alpha} paralogs responsible for specific cargos can be found in a single organism. The cargos contain nuclear localization sequences (NLSs), which are characterized by one or two clusters of basic amino acids (monopartite and bipartite NLSs, respectively). In this work we present the crystal structure of Imp{alpha} from M. musculus (residues 70-529, lacking the auto inhibitory domain) bound to a NLS peptide (pepTM). The peptide corresponds to the optimal sequence obtained by an oriented peptide library experiment designed to probe the specificity of the major NLS binding site. The peptide library used five degenerate positions and identified the sequence KKKRR as the optimal sequence for binding to this site for mouse Imp{alpha} (70-529). The protein was obtained using an E. coli expression system and purified by affinity chromatography followed by an ion exchange chromatography. A single crystal of Imp{alpha} -pepTM complex was grown by the hanging drop method. The data were collected using the Synchrotron Radiation Source LNLS, Brazil and processed to 2.3. Molecular replacement techniques were used to determine the crystal structure. Electron density corresponding to the peptide was present in both major and minor binding sites The peptide is bound to Imp{alpha} similar as the simian virus 40 (SV40) large tumour (T)-antigen NLS. Binding assays confirmed that the peptide bound to Imp{alpha} with low nM affinities. This is the first time that structural information has been linked to an oriented peptide library screening approach for importin-{alpha}; the results will contribute to understanding of the sequence determinants of classical NLSs, and may help identify as yet unidentified classical NLSs in novel proteins. (author)

  5. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library

    Directory of Open Access Journals (Sweden)

    Salem Mohamed

    2009-11-01

    Full Text Available Abstract Background To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs have been used for single nucleotide polymorphism (SNP discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA broodstock population. Results The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends. Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183 of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In

  6. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library.

    Science.gov (United States)

    Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E

    2009-11-25

    To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the

  7. Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology.

    Science.gov (United States)

    Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi

    2017-09-04

    Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.

  8. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  9. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

    Science.gov (United States)

    Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

    2009-06-01

    The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.

  10. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    Directory of Open Access Journals (Sweden)

    Kudrna David

    2011-03-01

    Full Text Available Abstract Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1 digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb to 157 Kb (Eg_Ba, very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×, contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae

  11. Robust DNA Isolation and High-throughput Sequencing Library Construction for Herbarium Specimens.

    Science.gov (United States)

    Saeidi, Saman; McKain, Michael R; Kellogg, Elizabeth A

    2018-03-08

    Herbaria are an invaluable source of plant material that can be used in a variety of biological studies. The use of herbarium specimens is associated with a number of challenges including sample preservation quality, degraded DNA, and destructive sampling of rare specimens. In order to more effectively use herbarium material in large sequencing projects, a dependable and scalable method of DNA isolation and library preparation is needed. This paper demonstrates a robust, beginning-to-end protocol for DNA isolation and high-throughput library construction from herbarium specimens that does not require modification for individual samples. This protocol is tailored for low quality dried plant material and takes advantage of existing methods by optimizing tissue grinding, modifying library size selection, and introducing an optional reamplification step for low yield libraries. Reamplification of low yield DNA libraries can rescue samples derived from irreplaceable and potentially valuable herbarium specimens, negating the need for additional destructive sampling and without introducing discernible sequencing bias for common phylogenetic applications. The protocol has been tested on hundreds of grass species, but is expected to be adaptable for use in other plant lineages after verification. This protocol can be limited by extremely degraded DNA, where fragments do not exist in the desired size range, and by secondary metabolites present in some plant material that inhibit clean DNA isolation. Overall, this protocol introduces a fast and comprehensive method that allows for DNA isolation and library preparation of 24 samples in less than 13 h, with only 8 h of active hands-on time with minimal modifications.

  12. A wing expressed sequence tag resource for Bicyclus anynana butterflies, an evo-devo model

    Directory of Open Access Journals (Sweden)

    Gruber Jonathan D

    2006-05-01

    Full Text Available Abstract Background Butterfly wing color patterns are a key model for integrating evolutionary developmental biology and the study of adaptive morphological evolution. Yet, despite the biological, economical and educational value of butterflies they are still relatively under-represented in terms of available genomic resources. Here, we describe an Expression Sequence Tag (EST project for Bicyclus anynana that has identified the largest available collection to date of expressed genes for any butterfly. Results By targeting cDNAs from developing wings at the stages when pattern is specified, we biased gene discovery towards genes potentially involved in pattern formation. Assembly of 9,903 ESTs from a subtracted library allowed us to identify 4,251 genes of which 2,461 were annotated based on BLAST analyses against relevant gene collections. Gene prediction software identified 2,202 peptides, of which 215 longer than 100 amino acids had no homology to any known proteins and, thus, potentially represent novel or highly diverged butterfly genes. We combined gene and Single Nucleotide Polymorphism (SNP identification by constructing cDNA libraries from pools of outbred individuals, and by sequencing clones from the 3' end to maximize alignment depth. Alignments of multi-member contigs allowed us to identify over 14,000 putative SNPs, with 316 genes having at least one high confidence double-hit SNP. We furthermore identified 320 microsatellites in transcribed genes that can potentially be used as genetic markers. Conclusion Our project was designed to combine gene and sequence polymorphism discovery and has generated the largest gene collection available for any butterfly and many potential markers in expressed genes. These resources will be invaluable for exploring the potential of B. anynana in particular, and butterflies in general, as models in ecological, evolutionary, and developmental genetics.

  13. Gene discovery and transcript analyses in the corn smut pathogen Ustilago maydis: expressed sequence tag and genome sequence comparison

    Directory of Open Access Journals (Sweden)

    Saville Barry J

    2007-09-01

    Full Text Available Abstract Background Ustilago maydis is the basidiomycete fungus responsible for common smut of corn and is a model organism for the study of fungal phytopathogenesis. To aid in the annotation of the genome sequence of this organism, several expressed sequence tag (EST libraries were generated from a variety of U. maydis cell types. In addition to utility in the context of gene identification and structure annotation, the ESTs were analyzed to identify differentially abundant transcripts and to detect evidence of alternative splicing and anti-sense transcription. Results Four cDNA libraries were constructed using RNA isolated from U. maydis diploid teliospores (U. maydis strains 518 × 521 and haploid cells of strain 521 grown under nutrient rich, carbon starved, and nitrogen starved conditions. Using the genome sequence as a scaffold, the 15,901 ESTs were assembled into 6,101 contiguous expressed sequences (contigs; among these, 5,482 corresponded to predicted genes in the MUMDB (MIPS Ustilago maydis database, while 619 aligned to regions of the genome not yet designated as genes in MUMDB. A comparison of EST abundance identified numerous genes that may be regulated in a cell type or starvation-specific manner. The transcriptional response to nitrogen starvation was assessed using RT-qPCR. The results of this suggest that there may be cross-talk between the nitrogen and carbon signalling pathways in U. maydis. Bioinformatic analysis identified numerous examples of alternative splicing and anti-sense transcription. While intron retention was the predominant form of alternative splicing in U. maydis, other varieties were also evident (e.g. exon skipping. Selected instances of both alternative splicing and anti-sense transcription were independently confirmed using RT-PCR. Conclusion Through this work: 1 substantial sequence information has been provided for U. maydis genome annotation; 2 new genes were identified through the discovery of 619

  14. Comprehensive evaluation and optimization of amplicon library preparation methods for high-throughput antibody sequencing.

    Science.gov (United States)

    Menzel, Ulrike; Greiff, Victor; Khan, Tarik A; Haessler, Ulrike; Hellmann, Ina; Friedensohn, Simon; Cook, Skylar C; Pogson, Mark; Reddy, Sai T

    2014-01-01

    High-throughput sequencing (HTS) of antibody repertoire libraries has become a powerful tool in the field of systems immunology. However, numerous sources of bias in HTS workflows may affect the obtained antibody repertoire data. A crucial step in antibody library preparation is the addition of short platform-specific nucleotide adapter sequences. As of yet, the impact of the method of adapter addition on experimental library preparation and the resulting antibody repertoire HTS datasets has not been thoroughly investigated. Therefore, we compared three standard library preparation methods by performing Illumina HTS on antibody variable heavy genes from murine antibody-secreting cells. Clonal overlap and rank statistics demonstrated that the investigated methods produced equivalent HTS datasets. PCR-based methods were experimentally superior to ligation with respect to speed, efficiency, and practicality. Finally, using a two-step PCR based method we established a protocol for antibody repertoire library generation, beginning from inputs as low as 1 ng of total RNA. In summary, this study represents a major advance towards a standardized experimental framework for antibody HTS, thus opening up the potential for systems-based, cross-experiment meta-analyses of antibody repertoires.

  15. Artificial promoter libraries for selected organisms and promoters derived from such libraries

    DEFF Research Database (Denmark)

    1998-01-01

    or organisms may be selected from prokaryotes and from eukaryotes; and in prokaryotes the consensus sequences to be retained most often will comprise the -35 signal (-35 to -30): TTGACA and the -10 signal (-12 to -7): TATAAT or parts of both comprising at least 3 conserved nucleotides of each, while...... in eukaryotes said consensus sequences should comprise a TATA box and at least one upstream activation sequence (UAS). Such artificial promoter libraries can be used i.a. for optimizing the expression of specific genes in various selected organisms....

  16. Interpreting a sequenced genome: toward a cosmid transgenic library of Caenorhabditis elegans.

    Science.gov (United States)

    Janke, D L; Schein, J E; Ha, T; Franz, N W; O'Neil, N J; Vatcher, G P; Stewart, H I; Kuervers, L M; Baillie, D L; Rose, A M

    1997-10-01

    We have generated a library of transgenic Caenorhabditis elegans strains that carry sequenced cosmids from the genome of the nematode. Each strain carries an extrachromosomal array containing a single cosmid, sequenced by the C. elegans Genome Sequencing Consortium, and a dominate Rol-6 marker. More than 500 transgenic strains representing 250 cosmids have been constructed. Collectively, these strains contain approximately 8 Mb of sequence data, or approximately 8% of the C. elegans genome. The transgenic strains are being used to rescue mutant phenotypes, resulting in a high-resolution map alignment of the genetic, physical, and DNA sequence maps of the nematode. We have chosen the region of chromosome III deleted by sDf127 and not covered by the duplication sDp8(III;I) as a starting point for a systematic correlation of mutant phenotypes with nucleotide sequence. In this defined region, we have identified 10 new essential genes whose mutant phenotypes range from developmental arrest at early larva, to maternal effect lethal. To date, 8 of these 10 essential genes have been rescued. In this region, these rescues represent approximately 10% of the genes predicted by GENEFINDER and considerably enhance the map alignment. Furthermore, this alignment facilitates future efforts to physically position and clone other genes in the region. [Updated information about the Transgenic Library is available via the Internet at http://darwin.mbb.sfu.ca/imbb/dbaillie/cos mid.html.

  17. Generation and evaluation of mammalian secreted and membrane protein expression libraries for high-throughput target discovery.

    Science.gov (United States)

    Panavas, Tadas; Lu, Jin; Liu, Xuesong; Winkis, Ann-Marie; Powers, Gordon; Naso, Michael F; Amegadzie, Bernard

    2011-09-01

    Expressed protein libraries are becoming a critical tool for new target discovery in the pharmaceutical industry. In order to get the most meaningful and comprehensive results from protein library screens, it is essential to have library proteins in their native conformation with proper post-translation modifications. This goal is achieved by expressing untagged human proteins in a human cell background. We optimized the transfection and cell culture conditions to maximize protein expression in a 96-well format so that the expression levels were comparable with the levels observed in shake flasks. For detection purposes, we engineered a 'tag after stop codon' system. Depending on the expression conditions, it was possible to express either native or tagged proteins from the same expression vector set. We created a human secretion protein library of 1432 candidates and a small plasma membrane protein set of about 500 candidates. Utilizing the optimized expression conditions, we expressed and analyzed both libraries by SDS-PAGE gel electrophoresis and Western blotting. Two thirds of secreted proteins could be detected by Western-blot analyses; almost half of them were visible on Coomassie stained gels. In this paper, we describe protein expression libraries that can be easily produced in mammalian expression systems in a 96-well format, with one protein expressed per well. The libraries and methods described allow for the development of robust, high-throughput functional screens designed to assay for protein specific functions associated with a relevant disease-specific activity. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. Construction of a cDNA library for sea cucumber Acaudina leucoprocta and differential expression of ferritin peptide

    Science.gov (United States)

    Zhou, Jun; Hou, Fujing; Li, Ye; Su, Xiurong; Li, Taiwu; Jin, Chunhua

    2016-07-01

    Acaudina leucoprocta is an edible sea cucumber of economic interest that is widely distributed in China. Little information is available concerning the molecular genetics of this species although such knowledge would contribute to a better understanding of the optimal conditions for its aquaculture and its mechanisms of defense against disease. Therefore, we constructed a cDNA library and, based on bioinformatics analysis of the sequences, the functions of 75% of the cDNAs were identified, including those involved in cell structure, energy metabolism, mitochondrial function, and signal transduction pathways. Approximately 25% of genes in the library were unmatched. The gene for A. leucoprocta ferritin was also cloned. The predicted amino-acid sequence of ferritin displayed significant homology with other sea-cucumber counterparts but indicated that it was a new member of the ferritin family. Semiquantitative real-time RT-PCR indicated the highest levels of ferritin mRNA expression in the intestine. A polyclonal antibody of ferritin was also produced. These data provide a set of molecular tools essential for further studies of the functions of ferritin protein in A. leucoprocta.

  19. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences.

    Science.gov (United States)

    Waldmann, Jost; Gerken, Jan; Hankeln, Wolfgang; Schweer, Timmy; Glöckner, Frank Oliver

    2014-06-14

    Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.

  20. Whitefly (Bemisia tabaci genome project: analysis of sequenced clones from egg, instar, and adult (viruliferous and non-viruliferous cDNA libraries

    Directory of Open Access Journals (Sweden)

    Czosnek Henryk

    2006-04-01

    Full Text Available Abstract Background The past three decades have witnessed a dramatic increase in interest in the whitefly Bemisia tabaci, owing to its nature as a taxonomically cryptic species, the damage it causes to a large number of herbaceous plants because of its specialized feeding in the phloem, and to its ability to serve as a vector of plant viruses. Among the most important plant viruses to be transmitted by B. tabaci are those in the genus Begomovirus (family, Geminiviridae. Surprisingly, little is known about the genome of this whitefly. The haploid genome size for male B. tabaci has been estimated to be approximately one billion bp by flow cytometry analysis, about five times the size of the fruitfly Drosophila melanogaster. The genes involved in whitefly development, in host range plasticity, and in begomovirus vector specificity and competency, are unknown. Results To address this general shortage of genomic sequence information, we have constructed three cDNA libraries from non-viruliferous whiteflies (eggs, immature instars, and adults and two from adult insects that fed on tomato plants infected by two geminiviruses: Tomato yellow leaf curl virus (TYLCV and Tomato mottle virus (ToMoV. In total, the sequence of 18,976 clones was determined. After quality control, and removal of 5,542 clones of mitochondrial origin 9,110 sequences remained which included 3,843 singletons and 1,017 contigs. Comparisons with public databases indicated that the libraries contained genes involved in cellular and developmental processes. In addition, approximately 1,000 bases aligned with the genome of the B. tabaci endosymbiotic bacterium Candidatus Portiera aleyrodidarum, originating primarily from the egg and instar libraries. Apart from the mitochondrial sequences, the longest and most abundant sequence encodes vitellogenin, which originated from whitefly adult libraries, indicating that much of the gene expression in this insect is directed toward the production

  1. A Bac Library and Paired-PCR Approach to Mapping and Completing the Genome Sequence of Sulfolobus Solfataricus P2

    DEFF Research Database (Denmark)

    She, Qunxin; Confalonieri, F.; Zivanovic, Y.

    2000-01-01

    The original strategy used in the Sulfolobus solfatnricus genome project was to sequence non overlapping, or minimally overlapping, cosmid or lambda inserts without constructing a physical map. However, after only about two thirds of the genome sequence was completed, this approach became counter......-productive because there was a high sequence bias in the cosmid and lambda libraries. Therefore, a new approach was devised for linking the sequenced regions which may be generally applicable. BAC libraries were constructed and terminal sequences of the clones were determined and used for both end mapping and PCR...

  2. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries

    Directory of Open Access Journals (Sweden)

    Kumar Santosh

    2012-12-01

    Full Text Available Abstract Background Flax (Linum usitatissimum L. is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents. Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from

  3. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  4. Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.).

    Science.gov (United States)

    Lee, Mi-Kyung; Zhang, Yang; Zhang, Meiping; Goebel, Mark; Kim, Hee Jin; Triplett, Barbara A; Stelly, David M; Zhang, Hong-Bin

    2013-03-28

    Cotton, one of the world's leading crops, is important to the world's textile and energy industries, and is a model species for studies of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. Here, we report the construction of a plant-transformation-competent binary bacterial artificial chromosome (BIBAC) library and comparative genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.) with one of its diploid putative progenitor species, G. raimondii Ulbr. We constructed the cotton BIBAC library in a vector competent for high-molecular-weight DNA transformation in different plant species through either Agrobacterium or particle bombardment. The library contains 76,800 clones with an average insert size of 135 kb, providing an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy probe. The quality and utility of the library were verified by identifying BIBACs containing genes important for fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and bacterial blight resistance. In order to gain an insight into the Upland cotton genome and its relationship with G. raimondii, we sequenced nearly 10,000 BIBAC ends (BESs) randomly selected from the library, generating approximately one BES for every 250 kb along the Upland cotton genome. The retroelement Gypsy/DIRS1 family predominates in the Upland cotton genome, accounting for over 77% of all transposable elements. From the BESs, we identified 1,269 simple sequence repeats (SSRs), of which 1,006 were new, thus providing additional markers for cotton genome research. Surprisingly, comparative sequence analysis showed that Upland cotton is much more diverged from G. raimondii at the genomic sequence level than expected. There seems to be no significant difference between the relationships of the Upland cotton D- and A-subgenomes with the G. raimondii genome, even though G

  5. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  6. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    Science.gov (United States)

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  7. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    Science.gov (United States)

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  8. Functional cloning using pFB retroviral cDNA expression libraries.

    Science.gov (United States)

    Felts, Katherine A; Chen, Keith; Zaharee, Kim; Sundar, Latha; Limjoco, Jamie; Miller, Anna; Vaillancourt, Peter

    2002-09-01

    Retroviral cDNA expression libraries allow the efficient introduction of complex cDNA libraries into virtually any mitotic cell type for screening based on gene function. The cDNA copy number per cell can be easily controlled by adjusting the multiplicity of infection, thus cell populations may be generated in which >90% of infected cells contain one to three cDNAs. We describe the isolation of two known oncogenes and one cell-surface receptor from a human Burkitt's lymphoma (Daudi) cDNA library inserted into the high-titer retroviral vector pFB.

  9. Non PCR-amplified Transcripts and AFLP fragments as reduced representations of the quail genome for 454 Titanium sequencing

    Directory of Open Access Journals (Sweden)

    Leterrier Christine

    2010-07-01

    Full Text Available Abstract Background SNP (Single Nucleotide Polymorphism discovery is now routinely performed using high-throughput sequencing of reduced representation libraries. Our objective was to adapt 454 GS FLX based sequencing methodologies in order to obtain the largest possible dataset from two reduced representations libraries, produced by AFLP (Amplified Fragment Length Polymorphism for genomic DNA, and EST (Expressed Sequence Tag for the transcribed fraction of the genome. Findings The expressed fraction was obtained by preparing cDNA libraries without PCR amplification from quail embryo and brain. To optimize the information content for SNP analyses, libraries were prepared from individuals selected in three quail lines and each individual in the AFLP library was tagged. Sequencing runs produced 399,189 sequence reads from cDNA and 373,484 from genomic fragments, covering close to 250 Mb of sequence in total. Conclusions Both methods used to obtain reduced representations for high-throughput sequencing were successful after several improvements. The protocols may be used for several sequencing applications, such as de novo sequencing, tagged PCR fragments or long fragment sequencing of cDNA.

  10. Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis; FINAL

    International Nuclear Information System (INIS)

    Simon, M. I.; Kim, U.-J.

    2002-01-01

    We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping and sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year

  11. Generation and analysis of expressed sequence tags from Botrytis cinerea

    Directory of Open Access Journals (Sweden)

    EVELYN SILVA

    2006-01-01

    Full Text Available Botrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23% have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively. The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively. Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen

  12. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  13. Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis.

    Science.gov (United States)

    Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

    2004-02-01

    To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.

  14. libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny

    OpenAIRE

    Butt, Davin; Roger, Andrew J; Blouin, Christian

    2005-01-01

    Background An increasing number of bioinformatics methods are considering the phylogenetic relationships between biological sequences. Implementing new methodologies using the maximum likelihood phylogenetic framework can be a time consuming task. Results The bioinformatics library libcov is a collection of C++ classes that provides a high and low-level interface to maximum likelihood phylogenetics, sequence analysis and a data structure for structural biological methods. libcov can be used ...

  15. Construction and partial sequencing of a subtractive library in Calcutta 4 (Musa AA in early stage of infection with Mycosphaerella fijiensis Morelet

    Directory of Open Access Journals (Sweden)

    Milady Mendoza-Rodríguez

    2006-10-01

    Full Text Available The study of genes involved in plant defense response against pathogen attack, is one of most important steps leading to the elucidation of disease resistance molecular mechanisms. The generation of subtracted deoxyribonucleic acid libraries (cDNA, by means of suppression subtractive hybridization technique (SSH, has been used for this purpose. A subtractive hybridization was made between a cDNA population obtained from ‘Calcutta 4’ inoculated leaves with M. fijiensis (CCIBP-Pf83 and a mixture of cDNA from ‘Calcutta 4’ non inoculated leaves and mycelium. Leaves samples were taken at 6, 10 and 12 days after inoculation. The subtracted library was obtained by cloning and transformation of subtracted products and as a result, 600 recombinants clones were obtained. Sequence analysis of sixty nine clones, revealed redundancy of the expressed sequence tags and most of them showed no homology with reported sequences at databases and only 13 % had a high homology with metalothioneins. The results constitute a step in advance in the molecular study of Musa-Mycosphaerella fijiensis interaction. Key words: Banana-Mycosphaerella fijiensis interaction, BlackSigatoka, Musa spp., suppression subtractive hybridization

  16. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  17. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    Science.gov (United States)

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  18. Identification of Anhydrobiosis-related Genes from an Expressed Sequence Tag Database in the Cryptobiotic Midge Polypedilum vanderplanki (Diptera; Chironomidae)*

    Science.gov (United States)

    Cornette, Richard; Kanamori, Yasushi; Watanabe, Masahiko; Nakahara, Yuichi; Gusev, Oleg; Mitsumasu, Kanako; Kadono-Okuda, Keiko; Shimomura, Michihiko; Mita, Kazuei; Kikawada, Takahiro; Okuda, Takashi

    2010-01-01

    Some organisms are able to survive the loss of almost all their body water content, entering a latent state known as anhydrobiosis. The sleeping chironomid (Polypedilum vanderplanki) lives in the semi-arid regions of Africa, and its larvae can survive desiccation in an anhydrobiotic form during the dry season. To unveil the molecular mechanisms of this resistance to desiccation, an anhydrobiosis-related Expressed Sequence Tag (EST) database was obtained from the sequences of three cDNA libraries constructed from P. vanderplanki larvae after 0, 12, and 36 h of desiccation. The database contained 15,056 ESTs distributed into 4,807 UniGene clusters. ESTs were classified according to gene ontology categories, and putative expression patterns were deduced for all clusters on the basis of the number of clones in each library; expression patterns were confirmed by real-time PCR for selected genes. Among up-regulated genes, antioxidants, late embryogenesis abundant (LEA) proteins, and heat shock proteins (Hsps) were identified as important groups for anhydrobiosis. Genes related to trehalose metabolism and various transporters were also strongly induced by desiccation. Those results suggest that the oxidative stress response plays a central role in successful anhydrobiosis. Similarly, protein denaturation and aggregation may be prevented by marked up-regulation of Hsps and the anhydrobiosis-specific LEA proteins. A third major feature is the predicted increase in trehalose synthesis and in the expression of various transporter proteins allowing the distribution of trehalose and other solutes to all tissues. PMID:20833722

  19. Flow Cytometry-Assisted Cloning of Specific Sequence Motifs from Complex 16S rRNA Gene Libraries

    DEFF Research Database (Denmark)

    Nielsen, Jeppe Lund; Schramm, Andreas; Bernhard, Anne E.

    2004-01-01

    for Systems Biology,3 Seattle, Washington, and Department of Ecological Microbiology, University of Bayreuth, Bayreuth, Germany2 A flow cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting......  FLOW CYTOMETRY-ASSISTED CLONING OF SPECIFIC SEQUENCE MOTIFS FROM COMPLEX 16S RRNA GENE LIBRARIES Jeppe L. Nielsen,1 Andreas Schramm,1,2 Anne E. Bernhard,1 Gerrit J. van den Engh,3 and David A. Stahl1* Department of Civil and Environmental Engineering, University of Washington,1 and Institute......-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant in a clone library of environmental 16S rRNA genes.  ...

  20. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    Science.gov (United States)

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-09-21

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

  1. Generation and Analysis of Expressed Sequence Tags (ESTs from Halophyte Atriplex canescens to Explore Salt-Responsive Related Genes

    Directory of Open Access Journals (Sweden)

    Jingtao Li

    2014-06-01

    Full Text Available Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs were also identified contributing to the study of A. canescens resources.

  2. Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

    Science.gov (United States)

    Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

    2017-10-09

    Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.

  3. Genomic library screening for viruses from the human dental plaque revealed pathogen-specific lytic phage sequences.

    Science.gov (United States)

    Al-Jarbou, Ahmed Nasser

    2012-01-01

    Bacterial pathogenesis presents an astounding arsenal of virulence factors that allow them to conquer many different niches throughout the course of infection. Principally fascinating is the fact that some bacterial species are able to induce different diseases by expression of different combinations of virulence factors. Nevertheless, studies aiming at screening for the presence of bacteriophages in humans have been limited. Such screening procedures would eventually lead to identification of phage-encoded properties that impart increased bacterial fitness and/or virulence in a particular niche, and hence, would potentially be used to reverse the course of bacterial infections. As the human oral cavity represents a rich and dynamic ecosystem for several upper respiratory tract pathogens. However, little is known about virus diversity in human dental plaque which is an important reservoir. We applied the culture-independent approach to characterize virus diversity in human dental plaque making a library from a virus DNA fraction amplified using a multiple displacement method and sequenced 80 clones. The resulting sequence showed 44% significant identities to GenBank databases by TBLASTX analysis. TBLAST homology comparisons showed that 66% was viral; 18% eukarya; 10% bacterial; 6% mobile elements. These sequences were sorted into 6 contigs and 45 single sequences in which 4 contigs and a single sequence showed significant identity to a small region of a putative prophage in the Corynebacterium diphtheria genome. These findings interestingly highlight the uniqueness of over half of the sequences, whilst the dominance of a pathogen-specific prophage sequences imply their role in virulence.

  4. An Expressed Sequence Tag Analysis of the Intertidal Brown Seaweeds Fucus serratus (L.) and F. vesiculosus (L.) (Heterokontophyta, Phaeophyceae) in Response to Abiotic Stressors

    NARCIS (Netherlands)

    Pearson, Gareth A.; Hoarau, Galice; Lago-Leston, Asuncion; Coyer, James A.; Kube, Michael; Reinhardt, Richard; Henckel, Kolja; Serrao, Ester T. A.; Corre, Erwan; Olsen, Jeanine L.

    In order to aid gene discovery and uncover genes responding to abiotic stressors in stress-tolerant brown algae of the genus Fucus, expressed sequence tags (ESTs) were studied in two species, Fucus serratus and Fucus vesiculosus. Clustering of over 12,000 ESTs from three libraries for heat

  5. Expressed sequence tag analysis of functional genes associated with adventitious rooting in Liriodendron hybrids.

    Science.gov (United States)

    Zhong, Y D; Sun, X Y; Liu, E Y; Li, Y Q; Gao, Z; Yu, F X

    2016-06-24

    Liriodendron hybrids (Liriodendron chinense x L. tulipifera) are important landscaping and afforestation hardwood trees. To date, little genomic research on adventitious rooting has been reported in these hybrids, as well as in the genus Liriodendron. In the present study, we used adventitious roots to construct the first cDNA library for Liriodendron hybrids. A total of 5176 expressed sequence tags (ESTs) were generated and clustered into 2921 unigenes. Among these unigenes, 2547 had significant homology to the non-redundant protein database representing a wide variety of putative functions. Homologs of these genes regulated many aspects of adventitious rooting, including those for auxin signal transduction and root hair development. Results of quantitative real-time polymerase chain reaction showed that AUX1, IRE, and FB1 were highly expressed in adventitious roots and the expression of AUX1, ARF1, NAC1, RHD1, and IRE increased during the development of adventitious roots. Additionally, 181 simple sequence repeats were identified from 166 ESTs and more than 91.16% of these were dinucleotide and trinucleotide repeats. To the best of our knowledge, the present study reports the identification of the genes associated with adventitious rooting in the genus Liriodendron for the first time and provides a valuable resource for future genomic studies. Expression analysis of selected genes could allow us to identify regulatory genes that may be essential for adventitious rooting.

  6. A part toolbox to tune genetic expression in Bacillus subtilis

    Science.gov (United States)

    Guiziou, Sarah; Sauveplane, Vincent; Chang, Hung-Ju; Clerté, Caroline; Declerck, Nathalie; Jules, Matthieu; Bonnet, Jerome

    2016-01-01

    Libraries of well-characterised components regulating gene expression levels are essential to many synthetic biology applications. While widely available for the Gram-negative model bacterium Escherichia coli, such libraries are lacking for the Gram-positive model Bacillus subtilis, a key organism for basic research and biotechnological applications. Here, we engineered a genetic toolbox comprising libraries of promoters, Ribosome Binding Sites (RBS), and protein degradation tags to precisely tune gene expression in B. subtilis. We first designed a modular Expression Operating Unit (EOU) facilitating parts assembly and modifications and providing a standard genetic context for gene circuits implementation. We then selected native, constitutive promoters of B. subtilis and efficient RBS sequences from which we engineered three promoters and three RBS sequence libraries exhibiting ∼14 000-fold dynamic range in gene expression levels. We also designed a collection of SsrA proteolysis tags of variable strength. Finally, by using fluorescence fluctuation methods coupled with two-photon microscopy, we quantified the absolute concentration of GFP in a subset of strains from the library. Our complete promoters and RBS sequences library comprising over 135 constructs enables tuning of GFP concentration over five orders of magnitude, from 0.05 to 700 μM. This toolbox of regulatory components will support many research and engineering applications in B. subtilis. PMID:27402159

  7. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes.

    Directory of Open Access Journals (Sweden)

    Andaine Seguin-Orlando

    Full Text Available Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries.

  8. The first insight into the salvia (lamiaceae) genome via bac library construction and high-throughput sequencing of target bac clones

    International Nuclear Information System (INIS)

    Hao, D.C.; Vautrin, S.; Berges, H.; Chen, S.L.

    2015-01-01

    Salvia is a representative genus of Lamiaceae, a eudicot family with significant species diversity and population adaptibility. One of the key goals of Salvia genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of medicinal plants to increase their health and productivity. Large-insert genomic libraries are a fundamental tool for achieving this purpose. We report herein the construction, characterization and screening of a gridded BAC library for Salvia officinalis (sage). The S. officinalis BAC library consists of 17,764 clones and the average insert size is 107 Kb, corresponding to 3 haploid genome equivalents. Seventeen positive clones (average insert size 115 Kb) containing five terpene synthase (TPS) genes were screened out by PCR and 12 of them were subject to Illumina HiSeq 2000 sequencing, which yielded 28,097,480 90-bp raw reads (2.53 Gb). Scaffolds containing sabinene synthase (Sab), a Sab homolog, TPS3 (kaurene synthase-like 2), copalyl diphosphate synthase 2 and one cytochrome P450 gene were retrieved via de novo assembly and annotation, which also have flanking noncoding sequences, including predicted promoters and repeat sequences. Among 2,638 repeat sequences, there are 330 amplifiable microsatellites. This BAC library provides a new resource for Lamiaceae genomic studies, including microsatellite marker development, physical mapping, comparative genomics and genome sequencing. Characterization of positive clones provided insights into the structure of the Salvia genome. These sequences will be used in the assembly of a future genome sequence for S. officinalis. (author)

  9. Analysis of expressed sequence tags from a NaHCO(3)-treated alkali-tolerant plant, Chloris virgata.

    Science.gov (United States)

    Nishiuchi, Shunsaku; Fujihara, Kazumasa; Liu, Shenkui; Takano, Tetsuo

    2010-04-01

    Chloris virgata Swartz (C. virgata) is a gramineous wild plant that can survive in saline-alkali areas in northeast China. To examine the tolerance mechanisms of C. virgata, we constructed a cDNA library from whole plants of C. virgata that had been treated with 100 mM NaHCO(3) for 24 h and sequenced 3168 randomly selected clones. Most (2590) of the expressed sequence tags (ESTs) showed significant similarity to sequences in the NCBI database. Of the 2590 genes, 1893 were unique. Gene Ontology (GO) Slim annotations were obtained for 1081 ESTs by BLAST2GO and it was found that 75 genes of them were annotated with GO terms "response to stress", "response to abiotic stimulus", and "response to biotic stimulus", indicating these genes were likely to function in tolerance mechanism of C. virgata. In a separate experiment, 24 genes that are known from previous studies to be associated with abiotic stress tolerance were further examined by real-time RT-PCR to see how their expressions were affected by NaHCO(3) stress. NaHCO(3) treatment up-regulated the expressions of pathogenesis-related gene (DC998527), Win1 precursor gene (DC998617), catalase gene (DC999385), ribosome inactivating protein 1 (DC999555), Na(+)/H(+) antiporter gene (DC998043), and two-component regulator gene (DC998236). Copyright 2010 Elsevier Masson SAS. All rights reserved.

  10. Preparation of Small RNA NGS Libraries from Biofluids.

    Science.gov (United States)

    Etheridge, Alton; Wang, Kai; Baxter, David; Galas, David

    2018-01-01

    Next generation sequencing (NGS) is a powerful method for transcriptome analysis. Unlike other gene expression profiling methods, such as microarrays, NGS provides additional information such as splicing variants, sequence polymorphisms, and novel transcripts. For this reason, NGS is well suited for comprehensive profiling of the wide range of extracellular RNAs (exRNAs) in biofluids. ExRNAs are of great interest because of their possible biological role in cell-to-cell communication and for their potential use as biomarkers or for therapeutic purposes. Here, we describe a modified protocol for preparation of small RNA libraries for NGS analysis. This protocol has been optimized for use with low-input exRNA-containing samples, such as plasma or serum, and has modifications designed to reduce the sequence-specific bias typically encountered with commercial small RNA library construction kits.

  11. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Directory of Open Access Journals (Sweden)

    Rodrigues NB

    2002-01-01

    Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  12. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    DEFF Research Database (Denmark)

    de Souza, S J; Camargo, A A; Briones, M R

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central ...

  13. Classification, expression pattern and comparative analysis of sugarcane expressed sequences tags (ESTs encoding glycine-rich proteins (GRPs

    Directory of Open Access Journals (Sweden)

    Fusaro Adriana

    2001-01-01

    Full Text Available Since the isolation of the first glycine-rich proteins (GRPs in plants a wealth of new GRPs have been identified. The highly specific but diverse expression pattern of grp genes, taken together with the distinct sub-cellular localization of some GRP groups, clearly indicate that these proteins are involved in several independent physiological processes. Notwithstanding the absence of a clear definition of the role of GRPs in plant cells, studies conducted with these proteins have provided new and interesting insights into the molecular biology and cell biology of plants. Complexly regulated promoters and distinct mechanisms for the regulation of gene expression have been demonstrated and new protein targeting pathways, as well as the exportation of GRPs from different cell types have been discovered. These data show that GRPs can be useful as markers and/or models to understand distinct aspects of plant biology. In this paper, the structural and functional features of these proteins in sugarcane (Saccharum officinarum L. are summarized. Since this is the first description of GRPs in sugarcane, special emphasis has been given to the expression pattern of these GRP genes by studying their abundance and prevalence in the different cDNA-libraries of the Sugarcane Expressed Sequence Tag (SUCEST project . The comparison of sugarcane GRPs with GRPs from other species is also discussed.

  14. Display of a Maize cDNA library on baculovirus infected insect cells

    Directory of Open Access Journals (Sweden)

    Jones Ian M

    2008-08-01

    Full Text Available Abstract Background Maize is a good model system for cereal crop genetics and development because of its rich genetic heritage and well-characterized morphology. The sequencing of its genome is well advanced, and new technologies for efficient proteomic analysis are needed. Baculovirus expression systems have been used for the last twenty years to express in insect cells a wide variety of eukaryotic proteins that require complex folding or extensive posttranslational modification. More recently, baculovirus display technologies based on the expression of foreign sequences on the surface of Autographa californica (AcMNPV have been developed. We investigated the potential of a display methodology for a cDNA library of maize young seedlings. Results We constructed a full-length cDNA library of young maize etiolated seedlings in the transfer vector pAcTMVSVG. The library contained a total of 2.5 × 105 independent clones. Expression of two known maize proteins, calreticulin and auxin binding protein (ABP1, was shown by western blot analysis of protein extracts from insect cells infected with the cDNA library. Display of the two proteins in infected insect cells was shown by selective biopanning using magnetic cell sorting and demonstrated proof of concept that the baculovirus maize cDNA display library could be used to identify and isolate proteins. Conclusion The maize cDNA library constructed in this study relies on the novel technology of baculovirus display and is unique in currently published cDNA libraries. Produced to demonstrate proof of principle, it opens the way for the development of a eukaryotic in vivo display tool which would be ideally suited for rapid screening of the maize proteome for binding partners, such as proteins involved in hormone regulation or defence.

  15. Making Strategic Decisions: Conducting and Using Research on the Impact of Sequenced Library Instruction

    Science.gov (United States)

    Lundstrom, Kacy; Martin, Pamela; Cochran, Dory

    2016-01-01

    This study explores the relationship between course grades and sequenced library instruction interventions throughout psychology students' curriculum. Researchers conducted this study to inform decisions about sustaining and improving program integrations for first- and second-year composition courses and to improve discipline-level integrations.…

  16. Identification of Bacterial Small RNAs by RNA Sequencing

    DEFF Research Database (Denmark)

    Gómez Lozano, María; Marvig, Rasmus Lykke; Molin, Søren

    2014-01-01

    sequencing (RNA-seq) is described that involves the preparation and analysis of three different sequencing libraries. As a signifi cant number of unique sRNAs are identifi ed in each library, the libraries can be used either alone or in combination to increase the number of sRNAs identifi ed. The approach......Small regulatory RNAs (sRNAs) in bacteria are known to modulate gene expression and control a variety of processes including metabolic reactions, stress responses, and pathogenesis in response to environmental signals. A method to identify bacterial sRNAs on a genome-wide scale based on RNA...... may be applied to identify sRNAs in any bacterium under different growth and stress conditions....

  17. Coding and decoding libraries of sequence-defined functional copolymers synthesized via photoligation.

    Science.gov (United States)

    Zydziak, Nicolas; Konrad, Waldemar; Feist, Florian; Afonin, Sergii; Weidner, Steffen; Barner-Kowollik, Christopher

    2016-11-30

    Designing artificial macromolecules with absolute sequence order represents a considerable challenge. Here we report an advanced light-induced avenue to monodisperse sequence-defined functional linear macromolecules up to decamers via a unique photochemical approach. The versatility of the synthetic strategy-combining sequential and modular concepts-enables the synthesis of perfect macromolecules varying in chemical constitution and topology. Specific functions are placed at arbitrary positions along the chain via the successive addition of monomer units and blocks, leading to a library of functional homopolymers, alternating copolymers and block copolymers. The in-depth characterization of each sequence-defined chain confirms the precision nature of the macromolecules. Decoding of the functional information contained in the molecular structure is achieved via tandem mass spectrometry without recourse to their synthetic history, showing that the sequence information can be read. We submit that the presented photochemical strategy is a viable and advanced concept for coding individual monomer units along a macromolecular chain.

  18. Expressed sequence tags from larval gut of the European corn borer (Ostrinia nubilalis: Exploring candidate genes potentially involved in Bacillus thuringiensis toxicity and resistance

    Directory of Open Access Journals (Sweden)

    Crespo Andre LB

    2009-06-01

    Full Text Available Abstract Background Lepidoptera represents more than 160,000 insect species which include some of the most devastating pests of crops, forests, and stored products. However, the genomic information on lepidopteran insects is very limited. Only a few studies have focused on developing expressed sequence tag (EST libraries from the guts of lepidopteran larvae. Knowledge of the genes that are expressed in the insect gut are crucial for understanding basic physiology of food digestion, their interactions with Bacillus thuringiensis (Bt toxins, and for discovering new targets for novel toxins for use in pest management. This study analyzed the ESTs generated from the larval gut of the European corn borer (ECB, Ostrinia nubilalis, one of the most destructive pests of corn in North America and the western world. Our goals were to establish an ECB larval gut-specific EST database as a genomic resource for future research and to explore candidate genes potentially involved in insect-Bt interactions and Bt resistance in ECB. Results We constructed two cDNA libraries from the guts of the fifth-instar larvae of ECB and sequenced a total of 15,000 ESTs from these libraries. A total of 12,519 ESTs (83.4% appeared to be high quality with an average length of 656 bp. These ESTs represented 2,895 unique sequences, including 1,738 singletons and 1,157 contigs. Among the unique sequences, 62.7% encoded putative proteins that shared significant sequence similarities (E-value ≤ 10-3with the sequences available in GenBank. Our EST analysis revealed 52 candidate genes that potentially have roles in Bt toxicity and resistance. These genes encode 18 trypsin-like proteases, 18 chymotrypsin-like proteases, 13 aminopeptidases, 2 alkaline phosphatases and 1 cadherin-like protein. Comparisons of expression profiles of 41 selected candidate genes between Cry1Ab-susceptible and resistant strains of ECB by RT-PCR showed apparently decreased expressions in 2 trypsin-like and 2

  19. RNA deep sequencing reveals differential microRNA expression during development of sea urchin and sea star.

    Directory of Open Access Journals (Sweden)

    Sabah Kadri

    Full Text Available microRNAs (miRNAs are small (20-23 nt, non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin and Patiria miniata (sea star are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc. to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads. Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common. We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html.

  20. RNA Deep Sequencing Reveals Differential MicroRNA Expression during Development of Sea Urchin and Sea Star

    Science.gov (United States)

    Kadri, Sabah; Hinman, Veronica F.; Benos, Panayiotis V.

    2011-01-01

    microRNAs (miRNAs) are small (20–23 nt), non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin) and Patiria miniata (sea star) are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc.) to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads). Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common). We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html. PMID:22216218

  1. Constructing and detecting a cDNA library for mites.

    Science.gov (United States)

    Hu, Li; Zhao, YaE; Cheng, Juan; Yang, YuanJun; Li, Chen; Lu, ZhaoHui

    2015-10-01

    RNA extraction and construction of complementary DNA (cDNA) library for mites have been quite challenging due to difficulties in acquiring tiny living mites and breaking their hard chitin. The present study is to explore a better method to construct cDNA library for mites that will lay the foundation on transcriptome and molecular pathogenesis research. We selected Psoroptes cuniculi as an experimental subject and took the following steps to construct and verify cDNA library. First, we combined liquid nitrogen grinding with TRIzol for total RNA extraction. Then, switching mechanism at 5' end of the RNA transcript (SMART) technique was used to construct full-length cDNA library. To evaluate the quality of cDNA library, the library titer and recombination rate were calculated. The reliability of cDNA library was detected by sequencing and analyzing positive clones and genes amplified by specific primers. The results showed that the RNA concentration was 836 ng/μl and the absorbance ratio at 260/280 nm was 1.82. The library titer was 5.31 × 10(5) plaque-forming unit (PFU)/ml and the recombination rate was 98.21%, indicating that the library was of good quality. In the 33 expressed sequence tags (ESTs) of P. cuniculi, two clones of 1656 and 1658 bp were almost identical with only three variable sites detected, which had an identity of 99.63% with that of Psoroptes ovis, indicating that the cDNA library was reliable. Further detection by specific primers demonstrated that the 553-bp Pso c II gene sequences of P. cuniculi had an identity of 98.56% with those of P. ovis, confirming that the cDNA library was not only reliable but also feasible.

  2. mESAdb: microRNA expression and sequence analysis database.

    Science.gov (United States)

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  3. Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains

    Directory of Open Access Journals (Sweden)

    Bharti Arvind K

    2008-12-01

    Full Text Available Abstract Background Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR and methylation spanning linker libraries (MSLL. These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. Results A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig, while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%. These two techniques were compared with other gene-enrichment methods, and shown to be complementary. Conclusion MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of

  4. Evaluation and Adaptation of a Laboratory-Based cDNA Library Preparation Protocol for Retrospective Sequencing of Archived MicroRNAs from up to 35-Year-Old Clinical FFPE Specimens.

    Science.gov (United States)

    Loudig, Olivier; Wang, Tao; Ye, Kenny; Lin, Juan; Wang, Yihong; Ramnauth, Andrew; Liu, Christina; Stark, Azadeh; Chitale, Dhananjay; Greenlee, Robert; Multerer, Deborah; Honda, Stacey; Daida, Yihe; Spencer Feigelson, Heather; Glass, Andrew; Couch, Fergus J; Rohan, Thomas; Ben-Dov, Iddo Z

    2017-03-14

    Formalin-fixed paraffin-embedded (FFPE) specimens, when used in conjunction with patient clinical data history, represent an invaluable resource for molecular studies of cancer. Even though nucleic acids extracted from archived FFPE tissues are degraded, their molecular analysis has become possible. In this study, we optimized a laboratory-based next-generation sequencing barcoded cDNA library preparation protocol for analysis of small RNAs recovered from archived FFPE tissues. Using matched fresh and FFPE specimens, we evaluated the robustness and reproducibility of our optimized approach, as well as its applicability to archived clinical specimens stored for up to 35 years. We then evaluated this cDNA library preparation protocol by performing a miRNA expression analysis of archived breast ductal carcinoma in situ (DCIS) specimens, selected for their relation to the risk of subsequent breast cancer development and obtained from six different institutions. Our analyses identified six miRNAs (miR-29a, miR-221, miR-375, miR-184, miR-363, miR-455-5p) differentially expressed between DCIS lesions from women who subsequently developed an invasive breast cancer (cases) and women who did not develop invasive breast cancer within the same time interval (control). Our thorough evaluation and application of this laboratory-based miRNA sequencing analysis indicates that the preparation of small RNA cDNA libraries can reliably be performed on older, archived, clinically-classified specimens.

  5. MytiBase: a knowledgebase of mussel (M. galloprovincialis transcribed sequences

    Directory of Open Access Journals (Sweden)

    Roch Philippe

    2009-02-01

    Full Text Available Abstract Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01 was constructed as determined by the high rate of gene discovery (65.6%. Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database http://mussel.cribi.unipd.it. Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels.

  6. Construction of a cDNA library and preliminary analysis of expressed sequence tags in Piper hainanense.

    Science.gov (United States)

    Fan, R; Ling, P; Hao, C Y; Li, F P; Huang, L F; Wu, B D; Wu, H S

    2015-10-19

    Black pepper is a perennial climbing vine. It is widely cultivated because its berries can be utilized not only as a spice in food but also for medicinal use. This study aimed to construct a standardized, high-quality cDNA library to facilitated identification of new Piper hainanense transcripts. For this, 262 unigenes were used to generate raw reads. The average length of these 262 unigenes was 774.8 bp. Of these, 94 genes (35.9%) were newly identified, according to the NCBI protein database. Thus, identification of new genes may broaden the molecular knowledge of P. hainanense on the basis of Clusters of Orthologous Groups and Gene Ontology categories. In addition, certain basic genes linked to physiological processes, which can contribute to disease resistance and thereby to the breeding of black pepper. A total of 26 unigenes were found to be SSR markers. Dinucleotide SSR was the main repeat motif, accounting for 61.54%, followed by trinucleotide SSR (23.07%). Eight primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism among twenty-one piper germplasm. These results present a novel sequence information of P. hainanense, which can serve as the foundation for further genetic research on this species.

  7. Evaluation of library preparation methods for Illumina next generation sequencing of small amounts of DNA from foodborne parasites.

    Science.gov (United States)

    Nascimento, Fernanda S; Wei-Pridgeon, Yuping; Arrowood, Michael J; Moss, Delynn; da Silva, Alexandre J; Talundzic, Eldin; Qvarnstrom, Yvonne

    2016-11-01

    Illumina library preparation methods for ultra-low input amounts were compared using genomic DNA from two foodborne parasites (Angiostrongylus cantonensis and Cyclospora cayetanensis) as examples. The Ovation Ultralow method resulted in libraries with the highest concentration and produced quality sequencing data, even when the input DNA was in the picogram range. Published by Elsevier B.V.

  8. Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress

    Science.gov (United States)

    2013-01-01

    Background Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Results Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Conclusions Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars. PMID:24074255

  9. Rapid profiling of the antigen regions recognized by serum antibodies using massively parallel sequencing of antigen-specific libraries.

    KAUST Repository

    Domina, Maria; Lanza Cariccio, Veronica; Benfatto, Salvatore; D'Aliberti, Deborah; Venza, Mario; Borgogni, Erica; Castellino, Flora; Biondo, Carmelo; D'Andrea, Daniel; Grassi, Luigi; Tramontano, Anna; Teti, Giuseppe; Felici, Franco; Beninati, Concetta

    2014-01-01

    There is a need for techniques capable of identifying the antigenic epitopes targeted by polyclonal antibody responses during deliberate or natural immunization. Although successful, traditional phage library screening is laborious and can map only some of the epitopes. To accelerate and improve epitope identification, we have employed massive sequencing of phage-displayed antigen-specific libraries using the Illumina MiSeq platform. This enabled us to precisely identify the regions of a model antigen, the meningococcal NadA virulence factor, targeted by serum antibodies in vaccinated individuals and to rank hundreds of antigenic fragments according to their immunoreactivity. We found that next generation sequencing can significantly empower the analysis of antigen-specific libraries by allowing simultaneous processing of dozens of library/serum combinations in less than two days, including the time required for antibody-mediated library selection. Moreover, compared with traditional plaque picking, the new technology (named Phage-based Representation OF Immuno-Ligand Epitope Repertoire or PROFILER) provides superior resolution in epitope identification. PROFILER seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. Furthermore, this method is also susceptible to find important applications in other fields covered by traditional quantitative serology.

  10. Rapid profiling of the antigen regions recognized by serum antibodies using massively parallel sequencing of antigen-specific libraries.

    Directory of Open Access Journals (Sweden)

    Maria Domina

    Full Text Available There is a need for techniques capable of identifying the antigenic epitopes targeted by polyclonal antibody responses during deliberate or natural immunization. Although successful, traditional phage library screening is laborious and can map only some of the epitopes. To accelerate and improve epitope identification, we have employed massive sequencing of phage-displayed antigen-specific libraries using the Illumina MiSeq platform. This enabled us to precisely identify the regions of a model antigen, the meningococcal NadA virulence factor, targeted by serum antibodies in vaccinated individuals and to rank hundreds of antigenic fragments according to their immunoreactivity. We found that next generation sequencing can significantly empower the analysis of antigen-specific libraries by allowing simultaneous processing of dozens of library/serum combinations in less than two days, including the time required for antibody-mediated library selection. Moreover, compared with traditional plaque picking, the new technology (named Phage-based Representation OF Immuno-Ligand Epitope Repertoire or PROFILER provides superior resolution in epitope identification. PROFILER seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. Furthermore, this method is also susceptible to find important applications in other fields covered by traditional quantitative serology.

  11. Rapid profiling of the antigen regions recognized by serum antibodies using massively parallel sequencing of antigen-specific libraries.

    KAUST Repository

    Domina, Maria

    2014-12-04

    There is a need for techniques capable of identifying the antigenic epitopes targeted by polyclonal antibody responses during deliberate or natural immunization. Although successful, traditional phage library screening is laborious and can map only some of the epitopes. To accelerate and improve epitope identification, we have employed massive sequencing of phage-displayed antigen-specific libraries using the Illumina MiSeq platform. This enabled us to precisely identify the regions of a model antigen, the meningococcal NadA virulence factor, targeted by serum antibodies in vaccinated individuals and to rank hundreds of antigenic fragments according to their immunoreactivity. We found that next generation sequencing can significantly empower the analysis of antigen-specific libraries by allowing simultaneous processing of dozens of library/serum combinations in less than two days, including the time required for antibody-mediated library selection. Moreover, compared with traditional plaque picking, the new technology (named Phage-based Representation OF Immuno-Ligand Epitope Repertoire or PROFILER) provides superior resolution in epitope identification. PROFILER seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. Furthermore, this method is also susceptible to find important applications in other fields covered by traditional quantitative serology.

  12. A transposase strategy for creating libraries of circularly permuted proteins.

    Science.gov (United States)

    Mehta, Manan M; Liu, Shirley; Silberg, Jonathan J

    2012-05-01

    A simple approach for creating libraries of circularly permuted proteins is described that is called PERMutation Using Transposase Engineering (PERMUTE). In PERMUTE, the transposase MuA is used to randomly insert a minitransposon that can function as a protein expression vector into a plasmid that contains the open reading frame (ORF) being permuted. A library of vectors that express different permuted variants of the ORF-encoded protein is created by: (i) using bacteria to select for target vectors that acquire an integrated minitransposon; (ii) excising the ensemble of ORFs that contain an integrated minitransposon from the selected vectors; and (iii) circularizing the ensemble of ORFs containing integrated minitransposons using intramolecular ligation. Construction of a Thermotoga neapolitana adenylate kinase (AK) library using PERMUTE revealed that this approach produces vectors that express circularly permuted proteins with distinct sequence diversity from existing methods. In addition, selection of this library for variants that complement the growth of Escherichia coli with a temperature-sensitive AK identified functional proteins with novel architectures, suggesting that PERMUTE will be useful for the directed evolution of proteins with new functions.

  13. LOX: Inferring level of expression from diverse methods of census sequencing

    KAUST Repository

    Zhang, Zhang

    2010-06-10

    Summary: We present LOX (Level Of eXpression) that estimates the Level Of gene eXpression from high-throughput-expressed sequence datasets with multiple treatments or samples. Unlike most analyses, LOX incorporates a gene bias model that facilitates integration of diverse transcriptomic sequencing data that arises when transcriptomic data have been produced using diverse experimental methodologies. LOX integrates overall sequence count tallies normalized by total expressed sequence count to provide expression levels for each gene relative to all treatments as well as Bayesian credible intervals. © The Author 2010. Published by Oxford University Press. All rights reserved.

  14. LOX: Inferring level of expression from diverse methods of census sequencing

    KAUST Repository

    Zhang, Zhang; Ló pez-Girá ldez, Francesc Francisco; Townsend, Jeffrey P.

    2010-01-01

    Summary: We present LOX (Level Of eXpression) that estimates the Level Of gene eXpression from high-throughput-expressed sequence datasets with multiple treatments or samples. Unlike most analyses, LOX incorporates a gene bias model that facilitates integration of diverse transcriptomic sequencing data that arises when transcriptomic data have been produced using diverse experimental methodologies. LOX integrates overall sequence count tallies normalized by total expressed sequence count to provide expression levels for each gene relative to all treatments as well as Bayesian credible intervals. © The Author 2010. Published by Oxford University Press. All rights reserved.

  15. Lactation transcriptomics in the Australian marsupial, Macropus eugenii: transcript sequencing and quantification

    Directory of Open Access Journals (Sweden)

    Whitley Jane C

    2007-11-01

    Full Text Available Abstract Background Lactation is an important aspect of mammalian biology and, amongst mammals, marsupials show one of the most complex lactation cycles. Marsupials, such as the tammar wallaby (Macropus eugenii give birth to a relatively immature newborn and progressive changes in milk composition and milk production regulate early stage development of the young. Results In order to investigate gene expression in the marsupial mammary gland during lactation, a comprehensive set of cDNA libraries was derived from lactating tissues throughout the lactation cycle of the tammar wallaby. A total of 14,837 express sequence tags were produced by cDNA sequencing. Sequence analysis and sequence assembly were used to construct a comprehensive catalogue of mammary transcripts. Sequence data from pregnant and early or late lactating specific cDNA libraries and, data from early or late lactation massively parallel sequencing strategies were combined to analyse the variation of milk protein gene expression during the lactation cycle. Conclusion Results show a steady increase in expression of genes coding for secreted protein during the lactation cycle that is associated with high proportion of transcripts coding for milk proteins. In addition, genes involved in immune function, translation and energy or anabolic metabolism are expressed across the lactation cycle. A number of potential new milk proteins or mammary gland remodelling markers, including noncoding RNAs have been identified.

  16. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76......(s) of tetranectin. The sequence analysis revealed a difference in both sequence and size of the noncoding regions between mouse and human cDNAs. Northern analysis of the various tissues from mouse, rat, and cow showed the major transcript(s) to be approximately 1 kb, which is similar in size to that observed...

  17. Predicting tissue-specific expressions based on sequence characteristics

    KAUST Repository

    Paik, Hyojung; Ryu, Tae Woo; Heo, Hyoungsam; Seo, Seungwon; Lee, Doheon; Hur, Cheolgoo

    2011-01-01

    In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.

  18. Predicting tissue-specific expressions based on sequence characteristics

    KAUST Repository

    Paik, Hyojung

    2011-04-30

    In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.

  19. Expressed sequence tag (EST) analysis of two subspecies of Metarhizium anisopliae reveals a plethora of secreted proteins with potential activity in insect hosts.

    Science.gov (United States)

    Freimoser, Florian M; Screen, Steven; Bagga, Savita; Hu, Gang; St Leger, Raymond J

    2003-01-01

    Expressed sequence tag (EST) libraries for Metarhizium anisopliae, the causative agent of green muscardine disease, were developed from the broad host-range pathogen Metarhizium anisopliae sf. anisopliae and the specific grasshopper pathogen, M. anisopliae sf. acridum. Approximately 1,700 5' end sequences from each subspecies were generated from cDNA libraries representing fungi grown under conditions that maximize secretion of cuticle-degrading enzymes. Both subspecies had ESTs for virtually all pathogenicity-related genes cloned to date from M. anisopliae, but many novel genes encoding potential virulence factors were also tagged. Enzymes with potential targets in the insect host included proteases, chitinases, phospholipases, lipases, esterases, phosphatases and enzymes producing toxic secondary metabolites. A diverse array of proteases composed 36 % of all M. anisopliae sf. anisopliae ESTs. Eighty percent of the ESTs that could be clustered into functional groups had significant matches (Ehistory of this clade.

  20. Directional gene expression and antisense transcripts in sexual and asexual stages of Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    López-Barragán María J

    2011-11-01

    Full Text Available Abstract Background It has been shown that nearly a quarter of the initial predicted gene models in the Plasmodium falciparum genome contain errors. Although there have been efforts to obtain complete cDNA sequences to correct the errors, the coverage of cDNA sequences on the predicted genes is still incomplete, and many gene models for those expressed in sexual or mosquito stages have not been validated. Antisense transcripts have widely been reported in P. falciparum; however, the extent and pattern of antisense transcripts in different developmental stages remain largely unknown. Results We have sequenced seven bidirectional libraries from ring, early and late trophozoite, schizont, gametocyte II, gametocyte V, and ookinete, and four strand-specific libraries from late trophozoite, schizont, gametocyte II, and gametocyte V of the 3D7 parasites. Alignment of the cDNA sequences to the 3D7 reference genome revealed stage-specific antisense transcripts and novel intron-exon splicing junctions. Sequencing of strand-specific cDNA libraries suggested that more genes are expressed in one direction in gametocyte than in schizont. Alternatively spliced genes, antisense transcripts, and stage-specific expressed genes were also characterized. Conclusions It is necessary to continue to sequence cDNA from different developmental stages, particularly those of non-erythrocytic stages. The presence of antisense transcripts in some gametocyte and ookinete genes suggests that these antisense RNA may play an important role in gene expression regulation and parasite development. Future gene expression studies should make use of directional cDNA libraries. Antisense transcripts may partly explain the observed discrepancy between levels of mRNA and protein expression.

  1. Peanut gene expression profiling in developing seeds at different reproduction stages during Aspergillus parasiticus infection

    Directory of Open Access Journals (Sweden)

    Liang Xuanqiang

    2008-02-01

    Full Text Available Abstract Background Peanut (Arachis hypogaea L. is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination. Results We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7 from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV. The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were

  2. Molecular cloning of chicken metallothionein. Deduction of the complete amino acid sequence and analysis of expression using cloned cDNA

    Energy Technology Data Exchange (ETDEWEB)

    Wei, D; Andrews, G K

    1988-01-25

    A cDNA library was constructed using RNA isolated from the livers of chickens which had been treated with zinc. This library was screened with a RNA probe complementary to mouse metallothionein-I (MT), and eight chicken MT cDNA clones were obtained. All of the cDNA clones contained nucleotide sequences homologous to regions of the longest (375 bp) cDNA clone. The latter contained an open reading frame of 189 bp, and the deduced amino acid sequence indicates a protein of 63 amino acids of which 20 are cysteine residues. Amino acid composition and partial amino acid sequence analyses of purified chicken MT protein agreed with the amino acid composition and sequence deduced from the cloned cDNA. Amino acid sequence comparison establish that chicken MT shares extensive homology with mammalian MTs. Southern blot analysis of chicken DNA indicates that the chicken MT gene is not a part of a large family of related sequences, but rather is likely to be a unique gene sequence. In the chicken liver, levels of chicken MT mRNA were rapidly induced by metals (Cd/sup 2 +/, Zn/sup 2 +/, Cu/sup 2 +/), glucocorticoids and lipopolysaccharide. MT mRNA was present in low levels in embryonic liver and increased to high levels during the first week after hatching before decreasing again to the basal levels found in adult liver. The results of this study establish that MT is highly conserved between birds and mammals and is regulated in the chicken by agents which also regulate expression of mammalian MT genes. However, in contrast to the mammals, the results suggest the existence of a single isoform of MT in the chicken.

  3. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    Directory of Open Access Journals (Sweden)

    Scoté-Blachon Céline

    2008-09-01

    Full Text Available Abstract Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression, LongSAGE and MPSS (Massively Parallel Signature Sequencing are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method.

  4. Single-Cell RNA Sequencing of Glioblastoma Cells.

    Science.gov (United States)

    Sen, Rajeev; Dolgalev, Igor; Bayin, N Sumru; Heguy, Adriana; Tsirigos, Aris; Placantonakis, Dimitris G

    2018-01-01

    Single-cell RNA sequencing (sc-RNASeq) is a recently developed technique used to evaluate the transcriptome of individual cells. As opposed to conventional RNASeq in which entire populations are sequenced in bulk, sc-RNASeq can be beneficial when trying to better understand gene expression patterns in markedly heterogeneous populations of cells or when trying to identify transcriptional signatures of rare cells that may be underrepresented when using conventional bulk RNASeq. In this method, we describe the generation and analysis of cDNA libraries from single patient-derived glioblastoma cells using the C1 Fluidigm system. The protocol details the use of the C1 integrated fluidics circuit (IFC) for capturing, imaging and lysing cells; performing reverse transcription; and generating cDNA libraries that are ready for sequencing and analysis.

  5. Identification of reproduction-related genes and SSR-markers through expressed sequence tags analysis of a monsoon breeding carp rohu, Labeo rohita (Hamilton).

    Science.gov (United States)

    Sahu, Dinesh K; Panda, Soumya P; Panda, Sujata; Das, Paramananda; Meher, Prem K; Hazra, Rupenangshu K; Peatman, Eric; Liu, Zhanjiang J; Eknath, Ambekar E; Nandi, Samiran

    2013-07-15

    Labeo rohita (Ham.) also called rohu is the most important freshwater aquaculture species on the Indian sub continent. Monsoon dependent breeding restricts its seed production beyond season indicating a strong genetic control about which very limited information is available. Additionally, few genomic resources are publicly available for this species. Here we sought to identify reproduction-relevant genes from normalized cDNA libraries of the brain-pituitary-gonad-liver (BPGL-axis) tissues of adult L. rohita collected during post preparatory phase. 6161 random clones sequenced (Sanger-based) from these libraries produced 4642 (75.34%) high-quality sequences. They were assembled into 3631 (78.22%) unique sequences composed of 709 contigs and 2922 singletons. A total of 182 unique sequences were found to be associated with reproduction-related genes, mainly under the GO term categories of reproduction, neuro-peptide hormone activity, hormone and receptor binding, receptor activity, signal transduction, embryonic development, cell-cell signaling, cell death and anti-apoptosis process. Several important reproduction-related genes reported here for the first time in L. rohita are zona pellucida sperm-binding protein 3, aquaporin-12, spermine oxidase, sperm associated antigen 7, testis expressed 261, progesterone receptor membrane component, Neuropeptide Y and Pro-opiomelanocortin. Quantitative RT-PCR-based analyses of 8 known and 8 unknown transcripts during preparatory and post-spawning phase showed increased expression level of most of the transcripts during preparatory phase (except Neuropeptide Y) in comparison to post-spawning phase indicating possible roles in initiation of gonad maturation. Expression of unknown transcripts was also found in prolific breeder common carp and tilapia, but levels of expression were much higher in seasonal breeder rohu. 3631 unique sequences contained 236 (6.49%) putative microsatellites with the AG (28.16%) repeat as the most

  6. Codon-Precise, Synthetic, Antibody Fragment Libraries Built Using Automated Hexamer Codon Additions and Validated through Next Generation Sequencing

    Directory of Open Access Journals (Sweden)

    Laura Frigotto

    2015-05-01

    Full Text Available We have previously described ProxiMAX, a technology that enables the fabrication of precise, combinatorial gene libraries via codon-by-codon saturation mutagenesis. ProxiMAX was originally performed using manual, enzymatic transfer of codons via blunt-end ligation. Here we present Colibra™: an automated, proprietary version of ProxiMAX used specifically for antibody library generation, in which double-codon hexamers are transferred during the saturation cycling process. The reduction in process complexity, resulting library quality and an unprecedented saturation of up to 24 contiguous codons are described. Utility of the method is demonstrated via fabrication of complementarity determining regions (CDR in antibody fragment libraries and next generation sequencing (NGS analysis of their quality and diversity.

  7. Analysis of expressed sequence tags of the cyclically parthenogenetic rotifer Brachionus plicatilis.

    Directory of Open Access Journals (Sweden)

    Koushirou Suga

    Full Text Available BACKGROUND: Rotifers are among the most common non-arthropod animals and are the most experimentally tractable members of the basal assemblage of metazoan phyla known as Gnathifera. The monogonont rotifer Brachionus plicatilis is a developing model system for ecotoxicology, aquatic ecology, cryptic speciation, and the evolution of sex, and is an important food source for finfish aquaculture. However, basic knowledge of the genome and transcriptome of any rotifer species has been lacking. METHODOLOGY/PRINCIPAL FINDINGS: We generated and partially sequenced a cDNA library from B. plicatilis and constructed a database of over 2300 expressed sequence tags corresponding to more than 450 transcripts. About 20% of the transcripts had no significant similarity to database sequences by BLAST; most of these contained open reading frames of significant length but few had recognized Pfam motifs. Sixteen transcripts accounted for 25% of the ESTs; four of these had no significant similarity to BLAST or Pfam databases. Putative up- and downstream untranslated regions are relatively short and AT rich. In contrast to bdelloid rotifers, there was no evidence of a conserved trans-spliced leader sequence among the transcripts and most genes were single-copy. CONCLUSIONS/SIGNIFICANCE: Despite the small size of this EST project it revealed several important features of the rotifer transcriptome and of individual monogonont genes. Because there is little genomic data for Gnathifera, the transcripts we found with no known function may represent genes that are species-, class-, phylum- or even superphylum-specific; the fact that some are among the most highly expressed indicates their importance. The absence of trans-spliced leader exons in this monogonont species contrasts with their abundance in bdelloid rotifers and indicates that the presence of this phenomenon can vary at the subphylum level. Our EST database provides a relatively large quantity of transcript

  8. Analysis of expressed sequence tags of the cyclically parthenogenetic rotifer Brachionus plicatilis.

    Science.gov (United States)

    Suga, Koushirou; Welch, David Mark; Tanaka, Yukari; Sakakura, Yoshitaka; Hagiwara, Atsushi

    2007-08-01

    Rotifers are among the most common non-arthropod animals and are the most experimentally tractable members of the basal assemblage of metazoan phyla known as Gnathifera. The monogonont rotifer Brachionus plicatilis is a developing model system for ecotoxicology, aquatic ecology, cryptic speciation, and the evolution of sex, and is an important food source for finfish aquaculture. However, basic knowledge of the genome and transcriptome of any rotifer species has been lacking. We generated and partially sequenced a cDNA library from B. plicatilis and constructed a database of over 2300 expressed sequence tags corresponding to more than 450 transcripts. About 20% of the transcripts had no significant similarity to database sequences by BLAST; most of these contained open reading frames of significant length but few had recognized Pfam motifs. Sixteen transcripts accounted for 25% of the ESTs; four of these had no significant similarity to BLAST or Pfam databases. Putative up- and downstream untranslated regions are relatively short and AT rich. In contrast to bdelloid rotifers, there was no evidence of a conserved trans-spliced leader sequence among the transcripts and most genes were single-copy. Despite the small size of this EST project it revealed several important features of the rotifer transcriptome and of individual monogonont genes. Because there is little genomic data for Gnathifera, the transcripts we found with no known function may represent genes that are species-, class-, phylum- or even superphylum-specific; the fact that some are among the most highly expressed indicates their importance. The absence of trans-spliced leader exons in this monogonont species contrasts with their abundance in bdelloid rotifers and indicates that the presence of this phenomenon can vary at the subphylum level. Our EST database provides a relatively large quantity of transcript-level data for B. plicatilis, and more generally of rotifers and other gnathiferan phyla, and

  9. Cloning, sequence and expression of the pel gene from an Amycolata sp.

    Science.gov (United States)

    Brühlmann, F; Keen, N T

    1997-11-20

    The pel gene from an Amycolata sp. encoding a pectate lyase (EC 4.2.2.2) was isolated by activity screening a genomic DNA library in Streptomyces lividans TK24. Subsequent subcloning and sequencing of a 2.3 kb BamHI BglII fragment revealed an open reading frame of 930 nt corresponding to a protein of 29,660 Da. The overall G + C content for the coding region was 65%, with a strong G + C preference in the third (wobble) codon position (93%). A putative ribosome-binding site 5'-GGGAG-3' preceded the translational start codon by 7 base pairs. The Amycolata pectate lyase contains a signal peptide of 26 amino acids, that is cleaved after the sequence Ala-Thr-Ala. The size of the deduced protein as well as its N-terminal amino-acid sequence match the wild-type pectate lyase from the Amycolata sp. Expression of the pel gene in S. lividans TK24 resulted in high pectate lyase activity in the culture supernatant, concomitant with the appearance of a dominant protein band on a sodium dodecyl polyacrylamide gel at 30 kDa. No pectate lyase activity was detected in E. coli BL21 with the pel gene under the strong T7 promotor. The deduced amino-acid sequence showed 40% identity with PelE from Erwinia chrysanthemi and the pectate lyase from Glomerella cingulata. The Amycolata pectate lyase clearly belongs to the pectate lyase superfamily, sharing all functional amino acids and likely has a similar structural topology as Pels from Erwinia chrysanthemi and Bacillus subtilis.

  10. Isolation and characterization of gene sequences expressed in cotton fiber

    Directory of Open Access Journals (Sweden)

    Taciana de Carvalho Coutinho

    2016-06-01

    Full Text Available ABSTRACT Cotton fiber are tubular cells which develop from the differentiation of ovule epidermis. In addition to being one of the most important natural fiber of the textile group, cotton fiber afford an excellent experimental system for studying the cell wall. The aim of this work was to isolate and characterise the genes expressed in cotton fiber (Gossypium hirsutum L. to be used in future work in cotton breeding. Fiber of the cotton cultivar CNPA ITA 90 II were used to extract RNA for the subsequent generation of a cDNA library. Seventeen sequences were obtained, of which 14 were already described in the NCBI database (National Centre for Biotechnology Information, such as those encoding the lipid transfer proteins (LTPs and arabinogalactans (AGP. However, other cDNAs such as the B05 clone, which displays homology with the glycosyltransferases, have still not been described for this crop. Nevertheless, results showed that several clones obtained in this study are associated with cell wall proteins, wall-modifying enzymes and lipid transfer proteins directly involved in fiber development.

  11. Inhibition of expression in Escherichia coli of a virulence regulator MglB of Francisella tularensis using external guide sequence technology.

    Directory of Open Access Journals (Sweden)

    Gaoping Xiao

    Full Text Available External guide sequences (EGSs have successfully been used to inhibit expression of target genes at the post-transcriptional level in both prokaryotes and eukaryotes. We previously reported that EGS accessible and cleavable sites in the target RNAs can rapidly be identified by screening random EGS (rEGS libraries. Here the method of screening rEGS libraries and a partial RNase T1 digestion assay were used to identify sites accessible to EGSs in the mRNA of a global virulence regulator MglB from Francisella tularensis, a Gram-negative pathogenic bacterium. Specific EGSs were subsequently designed and their activities in terms of the cleavage of mglB mRNA by RNase P were tested in vitro and in vivo. EGS73, EGS148, and EGS155 in both stem and M1 EGS constructs induced mglB mRNA cleavage in vitro. Expression of stem EGS73 and EGS155 in Escherichia coli resulted in significant reduction of the mglB mRNA level coded for the F. tularensis mglB gene inserted in those cells.

  12. CitEST libraries

    Directory of Open Access Journals (Sweden)

    Maria Luísa P. Natividade Targon

    2007-01-01

    Full Text Available In order to obtain a better understanding of what is citrus, 33 cDNA libraries were constructed from different citrus species and genera. Total RNA was extracted from fruits, leaves, flowers, bark, seeds and roots, and subjected or not to different biotic and abiotic stresses (pathogens and drought and at several developmental stages. To identify putative promoter sequences, as well as molecular markers that could be useful for breeding programs, one shotgun library was prepared from sweet orange (Citrus sinensis var. Olimpia. In addition, EST libraries were also constructed for a citrus pathogen, the oomycete Phythophthora parasitica in either virulent or avirulent form. A total of 286,559 cDNA clones from citrus were sequenced from their 5’ end, generating 242,790 valid reads of citrus. A total of 9,504 sequences were produced in the shotgun library and the valid reads were assembled using CAP3. In this procedure, we obtained 1,131 contigs and 4,083 singletons. A total of 19,200 cDNA clones from P. parasitica were sequenced, resulting in 16,400 valid reads. The number of ESTs generated in this project is, to our knowledge, the largest citrus sequence database in the world.

  13. Serial analysis of gene expression (SAGE) in normal human trabecular meshwork.

    Science.gov (United States)

    Liu, Yutao; Munro, Drew; Layfield, David; Dellinger, Andrew; Walter, Jeffrey; Peterson, Katherine; Rickman, Catherine Bowes; Allingham, R Rand; Hauser, Michael A

    2011-04-08

    To identify the genes expressed in normal human trabecular meshwork tissue, a tissue critical to the pathogenesis of glaucoma. Total RNA was extracted from human trabecular meshwork (HTM) harvested from 3 different donors. Extracted RNA was used to synthesize individual SAGE (serial analysis of gene expression) libraries using the I-SAGE Long kit from Invitrogen. Libraries were analyzed using SAGE 2000 software to extract the 17 base pair sequence tags. The extracted sequence tags were mapped to the genome using SAGE Genie map. A total of 298,834 SAGE tags were identified from all HTM libraries (96,842, 88,126, and 113,866 tags, respectively). Collectively, there were 107,325 unique tags. There were 10,329 unique tags with a minimum of 2 counts from a single library. These tags were mapped to known unique Unigene clusters. Approximately 29% of the tags (orphan tags) did not map to a known Unigene cluster. Thirteen percent of the tags mapped to at least 2 Unigene clusters. Sequence tags from many glaucoma-related genes, including myocilin, optineurin, and WD repeat domain 36, were identified. This is the first time SAGE analysis has been used to characterize the gene expression profile in normal HTM. SAGE analysis provides an unbiased sampling of gene expression of the target tissue. These data will provide new and valuable information to improve understanding of the biology of human aqueous outflow.

  14. Development of polymorphic genic-SSR markers by cDNA library sequencing in boxwood, Buxus spp. (Buxaceae)

    Science.gov (United States)

    Genic microsatellites or simple sequence repeat (genic-SSR) markers were developed in boxwood (Buxus taxa) for genetic diversity analysis, identification of taxa, and to facilitate breeding. cDNA libraries were developed from mRNA extracted from leaves of Buxus sempervirens ‘Vardar Valley’ and seque...

  15. Analyses of expressed sequence tags from the maize foliar pathogen Cercospora zeae-maydis identify novel genes expressed during vegetative, infectious, and reproductive growth.

    Science.gov (United States)

    Bluhm, Burton H; Dhillon, Braham; Lindquist, Erika A; Kema, Gert Hj; Goodwin, Stephen B; Dunkle, Larry D

    2008-11-04

    The ascomycete fungus Cercospora zeae-maydis is an aggressive foliar pathogen of maize that causes substantial losses annually throughout the Western Hemisphere. Despite its impact on maize production, little is known about the regulation of pathogenesis in C. zeae-maydis at the molecular level. The objectives of this study were to generate a collection of expressed sequence tags (ESTs) from C. zeae-maydis and evaluate their expression during vegetative, infectious, and reproductive growth. A total of 27,551 ESTs was obtained from five cDNA libraries constructed from vegetative and sporulating cultures of C. zeae-maydis. The ESTs, grouped into 4088 clusters and 531 singlets, represented 4619 putative unique genes. Of these, 36% encoded proteins similar (E value zeae-maydis, providing specific targets for characterization by molecular genetics and functional genomics. The EST data establish a foundation for future studies in evolutionary and comparative genomics among species of Cercospora and other groups of plant pathogenic fungi.

  16. A Microfluidic Device for Preparing Next Generation DNA Sequencing Libraries and for Automating Other Laboratory Protocols That Require One or More Column Chromatography Steps

    Science.gov (United States)

    Tan, Swee Jin; Phan, Huan; Gerry, Benjamin Michael; Kuhn, Alexandre; Hong, Lewis Zuocheng; Min Ong, Yao; Poon, Polly Suk Yean; Unger, Marc Alexander; Jones, Robert C.; Quake, Stephen R.; Burkholder, William F.

    2013-01-01

    Library preparation for next-generation DNA sequencing (NGS) remains a key bottleneck in the sequencing process which can be relieved through improved automation and miniaturization. We describe a microfluidic device for automating laboratory protocols that require one or more column chromatography steps and demonstrate its utility for preparing Next Generation sequencing libraries for the Illumina and Ion Torrent platforms. Sixteen different libraries can be generated simultaneously with significantly reduced reagent cost and hands-on time compared to manual library preparation. Using an appropriate column matrix and buffers, size selection can be performed on-chip following end-repair, dA tailing, and linker ligation, so that the libraries eluted from the chip are ready for sequencing. The core architecture of the device ensures uniform, reproducible column packing without user supervision and accommodates multiple routine protocol steps in any sequence, such as reagent mixing and incubation; column packing, loading, washing, elution, and regeneration; capture of eluted material for use as a substrate in a later step of the protocol; and removal of one column matrix so that two or more column matrices with different functional properties can be used in the same protocol. The microfluidic device is mounted on a plastic carrier so that reagents and products can be aliquoted and recovered using standard pipettors and liquid handling robots. The carrier-mounted device is operated using a benchtop controller that seals and operates the device with programmable temperature control, eliminating any requirement for the user to manually attach tubing or connectors. In addition to NGS library preparation, the device and controller are suitable for automating other time-consuming and error-prone laboratory protocols requiring column chromatography steps, such as chromatin immunoprecipitation. PMID:23894273

  17. A microfluidic device for preparing next generation DNA sequencing libraries and for automating other laboratory protocols that require one or more column chromatography steps.

    Directory of Open Access Journals (Sweden)

    Swee Jin Tan

    Full Text Available Library preparation for next-generation DNA sequencing (NGS remains a key bottleneck in the sequencing process which can be relieved through improved automation and miniaturization. We describe a microfluidic device for automating laboratory protocols that require one or more column chromatography steps and demonstrate its utility for preparing Next Generation sequencing libraries for the Illumina and Ion Torrent platforms. Sixteen different libraries can be generated simultaneously with significantly reduced reagent cost and hands-on time compared to manual library preparation. Using an appropriate column matrix and buffers, size selection can be performed on-chip following end-repair, dA tailing, and linker ligation, so that the libraries eluted from the chip are ready for sequencing. The core architecture of the device ensures uniform, reproducible column packing without user supervision and accommodates multiple routine protocol steps in any sequence, such as reagent mixing and incubation; column packing, loading, washing, elution, and regeneration; capture of eluted material for use as a substrate in a later step of the protocol; and removal of one column matrix so that two or more column matrices with different functional properties can be used in the same protocol. The microfluidic device is mounted on a plastic carrier so that reagents and products can be aliquoted and recovered using standard pipettors and liquid handling robots. The carrier-mounted device is operated using a benchtop controller that seals and operates the device with programmable temperature control, eliminating any requirement for the user to manually attach tubing or connectors. In addition to NGS library preparation, the device and controller are suitable for automating other time-consuming and error-prone laboratory protocols requiring column chromatography steps, such as chromatin immunoprecipitation.

  18. Isolation, sequence identification and tissue expression profile of a ...

    African Journals Online (AJOL)

    The complete expressed sequence tag (CDS) sequence of Banna mini-pig inbred line (BMI) ribokinase gene (RBKS) was amplified using the reverse transcription-polymerase chain reaction (RT-PCR) based on the conserved sequence information of the cattle or other mammals and known highly homologous swine ESTs.

  19. Automated design of degenerate codon libraries.

    Science.gov (United States)

    Mena, Marco A; Daugherty, Patrick S

    2005-12-01

    Degenerate codon libraries are frequently used in protein engineering and evolution studies but are often limited to targeting a small number of positions to adequately limit the search space. To mitigate this, codon degeneracy can be limited using heuristics or previous knowledge of the targeted positions. To automate design of libraries given a set of amino acid sequences, an algorithm (LibDesign) was developed that generates a set of possible degenerate codon libraries, their resulting size, and their score relative to a user-defined scoring function. A gene library of a specified size can then be constructed that is representative of the given amino acid distribution or that includes specific sequences or combinations thereof. LibDesign provides a new tool for automated design of high-quality protein libraries that more effectively harness existing sequence-structure information derived from multiple sequence alignment or computational protein design data.

  20. Analysis and functional annotation of expressed sequence tags (ESTs from multiple tissues of oil palm (Elaeis guineensis Jacq.

    Directory of Open Access Journals (Sweden)

    Lee Weng-Wah

    2007-10-01

    Full Text Available Abstract Background Oil palm is the second largest source of edible oil which contributes to approximately 20% of the world's production of oils and fats. In order to understand the molecular biology involved in in vitro propagation, flowering, efficient utilization of nitrogen sources and root diseases, we have initiated an expressed sequence tag (EST analysis on oil palm. Results In this study, six cDNA libraries from oil palm zygotic embryos, suspension cells, shoot apical meristems, young flowers, mature flowers and roots, were constructed. We have generated a total of 14537 expressed sequence tags (ESTs from these libraries, from which 6464 tentative unique contigs (TUCs and 2129 singletons were obtained. Approximately 6008 of these tentative unique genes (TUGs have significant matches to the non-redundant protein database, from which 2361 were assigned to one or more Gene Ontology categories. Predominant transcripts and differentially expressed genes were identified in multiple oil palm tissues. Homologues of genes involved in many aspects of flower development were also identified among the EST collection, such as CONSTANS-like, AGAMOUS-like (AGL2, AGL20, LFY-like, SQUAMOSA, SQUAMOSA binding protein (SBP etc. Majority of them are the first representatives in oil palm, providing opportunities to explore the cause of epigenetic homeotic flowering abnormality in oil palm, given the importance of flowering in fruit production. The transcript levels of two flowering-related genes, EgSBP and EgSEP were analysed in the flower tissues of various developmental stages. Gene homologues for enzymes involved in oil biosynthesis, utilization of nitrogen sources, and scavenging of oxygen radicals, were also uncovered among the oil palm ESTs. Conclusion The EST sequences generated will allow comparative genomic studies between oil palm and other monocotyledonous and dicotyledonous plants, development of gene-targeted markers for the reference genetic map

  1. Metagenomic analysis of lysogeny in Tampa Bay: implications for prophage gene expression.

    Directory of Open Access Journals (Sweden)

    Lauren McDaniel

    Full Text Available Phage integrase genes often play a role in the establishment of lysogeny in temperate phage by catalyzing the integration of the phage into one of the host's replicons. To investigate temperate phage gene expression, an induced viral metagenome from Tampa Bay was sequenced by 454/Pyrosequencing. The sequencing yielded 294,068 reads with 6.6% identifiable. One hundred-three sequences had significant similarity to integrases by BLASTX analysis (e < or =0.001. Four sequences with strongest amino-acid level similarity to integrases were selected and real-time PCR primers and probes were designed. Initial testing with microbial fraction DNA from Tampa Bay revealed 1.9 x 10(7, and 1300 gene copies of Vibrio-like integrase and Oceanicola-like integrase L(-1 respectively. The other two integrases were not detected. The integrase assay was then tested on microbial fraction RNA extracted from 200 ml of Tampa Bay water sampled biweekly over a 12 month time series. Vibrio-like integrase gene expression was detected in three samples, with estimated copy numbers of 2.4-1280 L(-1. Clostridium-like integrase gene expression was detected in 6 samples, with estimated copy numbers of 37 to 265 L(-1. In all cases, detection of integrase gene expression corresponded to the occurrence of lysogeny as detected by prophage induction. Investigation of the environmental distribution of the two expressed integrases in the Global Ocean Survey Database found the Vibrio-like integrase was present in genome equivalents of 3.14% of microbial libraries and all four viral metagenomes. There were two similar genes in the library from British Columbia and one similar gene was detected in both the Gulf of Mexico and Sargasso Sea libraries. In contrast, in the Arctic library eleven similar genes were observed. The Clostridium-like integrase was less prevalent, being found in 0.58% of the microbial and none of the viral libraries. These results underscore the value of metagenomic data

  2. Secretory Overexpression of Bacillus thermocatenulatus Lipase in Saccharomyces cerevisiae Using Combinatorial Library Strategy.

    Science.gov (United States)

    Kajiwara, Shota; Yamada, Ryosuke; Ogino, Hiroyasu

    2018-04-10

    Simple and cost-effective lipase expression host microorganisms are highly desirable. A combinatorial library strategy is used to improve the secretory expression of lipase from Bacillus thermocatenulatus (BTL2) in the culture supernatant of Saccharomyces cerevisiae. A plasmid library including expression cassettes composed of sequences encoding one of each 15 promoters, 15 secretion signals, and 15 terminators derived from yeast species, S. cerevisiae, Pichia pastoris, and Hansenula polymorpha, is constructed. The S. cerevisiae transformant YPH499/D4, comprising H. polymorpha GAP promoter, S. cerevisiae SAG1 secretion signal, and P. pastoris AOX1 terminator, is selected by high-throughput screening. This transformant expresses BTL2 extra-cellularly with a 130-fold higher than the control strain, comprising S. cerevisiae PGK1 promoter, S. cerevisiae α-factor secretion signal, and S. cerevisiae PGK1 terminator, after cultivation for 72 h. This combinatorial library strategy holds promising potential for application in the optimization of the secretory expression of proteins in yeast. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Optimal protein library design using recombination or point mutations based on sequence-based scoring functions.

    Science.gov (United States)

    Pantazes, Robert J; Saraf, Manish C; Maranas, Costas D

    2007-08-01

    In this paper, we introduce and test two new sequence-based protein scoring systems (i.e. S1, S2) for assessing the likelihood that a given protein hybrid will be functional. By binning together amino acids with similar properties (i.e. volume, hydrophobicity and charge) the scoring systems S1 and S2 allow for the quantification of the severity of mismatched interactions in the hybrids. The S2 scoring system is found to be able to significantly functionally enrich a cytochrome P450 library over other scoring methods. Given this scoring base, we subsequently constructed two separate optimization formulations (i.e. OPTCOMB and OPTOLIGO) for optimally designing protein combinatorial libraries involving recombination or mutations, respectively. Notably, two separate versions of OPTCOMB are generated (i.e. model M1, M2) with the latter allowing for position-dependent parental fragment skipping. Computational benchmarking results demonstrate the efficacy of models OPTCOMB and OPTOLIGO to generate high scoring libraries of a prespecified size.

  4. ESPRIT: an automated, library-based method for mapping and soluble expression of protein domains from challenging targets.

    Science.gov (United States)

    Yumerefendi, Hayretin; Tarendeau, Franck; Mas, Philippe J; Hart, Darren J

    2010-10-01

    Expression of sufficient quantities of soluble protein for structural biology and other applications is often a very difficult task, especially when multimilligram quantities are required. In order to improve yield, solubility or crystallisability of a protein, it is common to subclone shorter genetic constructs corresponding to single- or multi-domain fragments. However, it is not always clear where domain boundaries are located, especially when working on novel targets with little or no sequence similarity to other proteins. Several methods have been described employing aspects of directed evolution to the recombinant expression of challenging proteins. These combine the construction of a random library of genetic constructs of a target with a screening or selection process to identify solubly expressing protein fragments. Here we review several datasets from the ESPRIT (Expression of Soluble Proteins by Random Incremental Truncation) technology to provide a view on its capabilities. Firstly, we demonstrate how it functions using the well-characterised NF-kappaB p50 transcription factor as a model system. Secondly, application of ESPRIT to the challenging PB2 subunit of influenza polymerase has led to several novel atomic resolution structures; here we present an overview of the screening phase of that project. Thirdly, analysis of the human kinase TBK1 is presented to show how the ESPRIT technology rapidly addresses the compatibility of challenging targets with the Escherichia coli expression system.

  5. Citrus plastid-related gene profiling based on expressed sequence tag analyses

    Directory of Open Access Journals (Sweden)

    Tercilio Calsa Jr.

    2007-01-01

    Full Text Available Plastid-related sequences, derived from putative nuclear or plastome genes, were searched in a large collection of expressed sequence tags (ESTs and genomic sequences from the Citrus Biotechnology initiative in Brazil. The identified putative Citrus chloroplast gene sequences were compared to those from Arabidopsis, Eucalyptus and Pinus. Differential expression profiling for plastid-directed nuclear-encoded proteins and photosynthesis-related gene expression variation between Citrus sinensis and Citrus reticulata, when inoculated or not with Xylella fastidiosa, were also analyzed. Presumed Citrus plastome regions were more similar to Eucalyptus. Some putative genes appeared to be preferentially expressed in vegetative tissues (leaves and bark or in reproductive organs (flowers and fruits. Genes preferentially expressed in fruit and flower may be associated with hypothetical physiological functions. Expression pattern clustering analysis suggested that photosynthesis- and carbon fixation-related genes appeared to be up- or down-regulated in a resistant or susceptible Citrus species after Xylella inoculation in comparison to non-infected controls, generating novel information which may be helpful to develop novel genetic manipulation strategies to control Citrus variegated chlorosis (CVC.

  6. Digital gene expression analysis of gene expression differences within Brassica diploids and allopolyploids.

    Science.gov (United States)

    Jiang, Jinjin; Wang, Yue; Zhu, Bao; Fang, Tingting; Fang, Yujie; Wang, Youping

    2015-01-27

    Brassica includes many successfully cultivated crop species of polyploid origin, either by ancestral genome triplication or by hybridization between two diploid progenitors, displaying complex repetitive sequences and transposons. The U's triangle, which consists of three diploids and three amphidiploids, is optimal for the analysis of complicated genomes after polyploidization. Next-generation sequencing enables the transcriptome profiling of polyploids on a global scale. We examined the gene expression patterns of three diploids (Brassica rapa, B. nigra, and B. oleracea) and three amphidiploids (B. napus, B. juncea, and B. carinata) via digital gene expression analysis. In total, the libraries generated between 5.7 and 6.1 million raw reads, and the clean tags of each library were mapped to 18547-21995 genes of B. rapa genome. The unambiguous tag-mapped genes in the libraries were compared. Moreover, the majority of differentially expressed genes (DEGs) were explored among diploids as well as between diploids and amphidiploids. Gene ontological analysis was performed to functionally categorize these DEGs into different classes. The Kyoto Encyclopedia of Genes and Genomes analysis was performed to assign these DEGs into approximately 120 pathways, among which the metabolic pathway, biosynthesis of secondary metabolites, and peroxisomal pathway were enriched. The non-additive genes in Brassica amphidiploids were analyzed, and the results indicated that orthologous genes in polyploids are frequently expressed in a non-additive pattern. Methyltransferase genes showed differential expression pattern in Brassica species. Our results provided an understanding of the transcriptome complexity of natural Brassica species. The gene expression changes in diploids and allopolyploids may help elucidate the morphological and physiological differences among Brassica species.

  7. GBParsy: A GenBank flatfile parser library with high speed

    Directory of Open Access Journals (Sweden)

    Kim Yeon-Ki

    2008-07-01

    Full Text Available Abstract Background GenBank flatfile (GBF format is one of the most popular sequence file formats because of its detailed sequence features and ease of readability. To use the data in the file by a computer, a parsing process is required and is performed according to a given grammar for the sequence and the description in a GBF. Currently, several parser libraries for the GBF have been developed. However, with the accumulation of DNA sequence information from eukaryotic chromosomes, parsing a eukaryotic genome sequence with these libraries inevitably takes a long time, due to the large GBF file and its correspondingly large genomic nucleotide sequence and related feature information. Thus, there is significant need to develop a parsing program with high speed and efficient use of system memory. Results We developed a library, GBParsy, which was C language-based and parses GBF files. The parsing speed was maximized by using content-specified functions in place of regular expressions that are flexible but slow. In addition, we optimized an algorithm related to memory usage so that it also increased parsing performance and efficiency of memory usage. GBParsy is at least 5 - 100× faster than current parsers in benchmark tests. Conclusion GBParsy is estimated to extract annotated information from almost 100 Mb of a GenBank flatfile for chromosomal sequence information within a second. Thus, it should be used for a variety of applications such as on-time visualization of a genome at a web site.

  8. Mining microsatellite markers from public expressed sequence tag

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 91; Issue 3. Mining microsatellite markers from public expressed sequence tag sequences for genetic diversity analysis in pomegranate. Zai-Hai Jian Xin-She Liu Jian-Bin Hu Yan-Hui Chen Jian-Can Feng. Research Note Volume 91 Issue 3 December 2012 pp 353-358 ...

  9. Immunoresponse of Coho salmon immunized with a gene expression library from Piscirickettsia salmonis

    Directory of Open Access Journals (Sweden)

    ALVARO MIQUEL

    2003-01-01

    Full Text Available We have used the expression library immunization technology to study the protection of Coho salmon Oncorhynchus kisutch to the infection with Piscirickettsia salmonis. Purified DNA from this bacterium was sonicated and the fragments were cloned in the expression vector pCMV-Bios. Two libraries were obtained containing 22,000 and 28,000 colonies and corresponding to approximately 8 and 10 times the genome of the pathogen, respectively. On average, the size of the inserts ranged between 300 and 1,000 bp. The plasmid DNA isolated from one of these libraries was purified and 20 µg were injected intramuscularly into 60 fish followed by a second dose of 10 µg applied 40 days later. As control, fish were injected with the same amount of DNA of the vector pCMV-Bios without insert. The titer of IgM anti-P. salmonis of vaccinated fish, evaluated 60 days post-injection, was significantly higher than that of the control group injected with the vector alone. Moreover, this response was specific against P. salmonis antigens, since no cross reaction was detected with Renibacterium salmoninarum and Yersinia ruckeri. The vaccinated and control fish were challenged 60 days after the second dose of DNA with 2.5 x 10(7 P. salmonis corresponding to 7.5 times the LD50. At 30 days post-challenge, 100% mortality was obtained with the control fish while 20% of the vaccinated animals survived. All surviving fish exhibited a lower bacterial load in the kidney than control fish. The expression library was also tested in Balb/c mice and it was found that the humoral immune response was specific to P. salmonis and it was dependent on the amount of DNA injected

  10. Libraries of Synthetic TALE-Activated Promoters: Methods and Applications.

    Science.gov (United States)

    Schreiber, T; Tissier, A

    2016-01-01

    The discovery of proteins with programmable DNA-binding specificities triggered a whole array of applications in synthetic biology, including genome editing, regulation of transcription, and epigenetic modifications. Among those, transcription activator-like effectors (TALEs) due to their natural function as transcription regulators, are especially well-suited for the development of orthogonal systems for the control of gene expression. We describe here the construction and testing of libraries of synthetic TALE-activated promoters which are under the control of a single TALE with a given DNA-binding specificity. These libraries consist of a fixed DNA-binding element for the TALE, a TATA box, and variable sequences of 19 bases upstream and 43 bases downstream of the DNA-binding element. These libraries were cloned using a Golden Gate cloning strategy making them usable as standard parts in a modular cloning system. The broad range of promoter activities detected and the versatility of these promoter libraries make them valuable tools for applications in the fine-tuning of expression in metabolic engineering projects or in the design and implementation of regulatory circuits. © 2016 Elsevier Inc. All rights reserved.

  11. PERMutation Using Transposase Engineering (PERMUTE): A Simple Approach for Constructing Circularly Permuted Protein Libraries.

    Science.gov (United States)

    Jones, Alicia M; Atkinson, Joshua T; Silberg, Jonathan J

    2017-01-01

    Rearrangements that alter the order of a protein's sequence are used in the lab to study protein folding, improve activity, and build molecular switches. One of the simplest ways to rearrange a protein sequence is through random circular permutation, where native protein termini are linked together and new termini are created elsewhere through random backbone fission. Transposase mutagenesis has emerged as a simple way to generate libraries encoding different circularly permuted variants of proteins. With this approach, a synthetic transposon (called a permuteposon) is randomly inserted throughout a circularized gene to generate vectors that express different permuted variants of a protein. In this chapter, we outline the protocol for constructing combinatorial libraries of circularly permuted proteins using transposase mutagenesis, and we describe the different permuteposons that have been developed to facilitate library construction.

  12. Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

    Directory of Open Access Journals (Sweden)

    Devier Benjamin

    2007-08-01

    Full Text Available Abstract Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics.

  13. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  14. Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

    Energy Technology Data Exchange (ETDEWEB)

    Gao, Jian [Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute of Sichuan Agricultural University, Wenjiang, Sichuan (China); Luo, Mao [Drug Discovery Research Center of Luzhou Medical College, Luzhou, Sichuan (China); Zhu, Ye; He, Ying; Wang, Qin [Department of Pharmacy of Luzhou Medical College, Luzhou, Sichuan (China); Zhang, Chun, E-mail: zc83good@126.com [Department of Pharmacy of Luzhou Medical College, Luzhou, Sichuan (China)

    2015-03-27

    Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries of untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.

  15. Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

    International Nuclear Information System (INIS)

    Gao, Jian; Luo, Mao; Zhu, Ye; He, Ying; Wang, Qin; Zhang, Chun

    2015-01-01

    Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries of untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR

  16. Profiling of wheat class III peroxidase genes derived from powdery mildew-attacked epidermis reveals distinct sequence-associated expression patterns.

    Science.gov (United States)

    Liu, Guosheng; Sheng, Xiaoyan; Greenshields, David L; Ogieglo, Adam; Kaminskyj, Susan; Selvaraj, Gopalan; Wei, Yangdou

    2005-07-01

    A cDNA library was constructed from leaf epidermis of diploid wheat (Triticum monococcum) infected with the powdery mildew fungus (Blumeria graminis f. sp. tritici) and was screened for genes encoding peroxidases. From 2,500 expressed sequence tags (ESTs), 36 cDNAs representing 10 peroxidase genes (designated TmPRX1 to TmPRX10) were isolated and further characterized. Alignment of the deduced amino acid sequences and phylogenetic clustering with peroxidases from other plant species demonstrated that these peroxidases fall into four distinct groups. Differential expression and tissue-specific localization among the members were observed during the B. graminis f. sp. tritici attack using Northern blots and reverse-transcriptase polymerase chain reaction analyses. Consistent with its abundance in the EST collection, TmPRX1 expression showed the highest induction during pathogen attack and fluctuated in response to the fungal parasitic stages. TmPRX1 to TmPRX6 were expressed predominantly in mesophyll cells, whereas TmPRX7 to TmPRX10, which feature a putative C-terminal propeptide, were detectable mainly in epidermal cells. Using TmPRX8 as a representative, we demonstrated that its C-terminal propeptide was sufficient to target a green fluorescent protein fusion protein to the vacuoles in onion cells. Finally, differential expression profiles of the TmPRXs after abiotic stresses and signal molecule treatments were used to dissect the potential role of these peroxidases in multiple stress and defense pathways.

  17. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Science.gov (United States)

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  18. PASSIOMA: Exploring Expressed Sequence Tags during Flower Development in Passiflora spp.

    Directory of Open Access Journals (Sweden)

    Lucas Cutri

    2012-01-01

    Full Text Available The genus Passiflora provides a remarkable example of floral complexity and diversity. The extreme variation of Passiflora flower morphologies allowed a wide range of interactions with pollinators to evolve. We used the analysis of expressed sequence tags (ESTs as an approach for the characterization of genes expressed during Passiflora reproductive development. Analyzing the Passiflora floral EST database (named PASSIOMA, we found sequences showing significant sequence similarity to genes known to be involved in reproductive development such as MADS-box genes. Some of these sequences were studied using RT-PCR and in situ hybridization confirming their expression during Passiflora flower development. The detection of these novel sequences can contribute to the development of EST-based markers for important agronomic traits as well as to the establishment of genomic tools to study the naturally occurring floral diversity among Passiflora species.

  19. Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing

    DEFF Research Database (Denmark)

    Willenbrock, Hanni; Salomon, Jesper; Søkilde, Rolf

    2009-01-01

    Recently, next-generation sequencing has been introduced as a promising, new platform for assessing the copy number of transcripts, while the existing microarray technology is considered less reliable for absolute, quantitative expression measurements. Nonetheless, so far, results from the two...... technologies have only been compared based on biological data, leading to the conclusion that, although they are somewhat correlated, expression values differ significantly. Here, we use synthetic RNA samples, resembling human microRNA samples, to find that microarray expression measures actually correlate...... better with sample RNA content than expression measures obtained from sequencing data. In addition, microarrays appear highly sensitive and perform equivalently to next-generation sequencing in terms of reproducibility and relative ratio quantification....

  20. The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments.

    Science.gov (United States)

    Murvai, J; Vlahovicek, K; Barta, E; Pongor, S

    2001-01-01

    SBASE 8.0 is the eighth release of the SBASE library of protein domain sequences that contains 294 898 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to most major sequence databases and sequence pattern collections. The entries are clustered into over 2005 statistically validated domain groups (SBASE-A) and 595 non-validated groups (SBASE-B), provided with several WWW-based search and browsing facilities for online use. A domain-search facility was developed, based on non-parametric pattern recognition methods, including artificial neural networks. SBASE 8.0 is freely available by anonymous 'ftp' file transfer from ftp.icgeb.trieste.it. Automated searching of SBASE can be carried out with the WWW servers http://www.icgeb.trieste.it/sbase/ and http://sbase.abc. hu/sbase/.

  1. Preparation of a differentially expressed, full-length cDNA expression library by RecA-mediated triple-strand formation with subtractively enriched cDNA fragments

    NARCIS (Netherlands)

    Hakvoort, T. B.; Spijkers, J. A.; Vermeulen, J. L.; Lamers, W. H.

    1996-01-01

    We have developed a fast and general method to obtain an enriched, full-length cDNA expression library with subtractively enriched cDNA fragments. The procedure relies on RecA-mediated triple-helix formation of single-stranded cDNA fragments with a double-stranded cDNA plasmid library. The complexes

  2. Tuning protein expression using synonymous codon libraries targeted to the 5' mRNA coding region

    DEFF Research Database (Denmark)

    Goltermann, Lise; Borch Jensen, Martin; Bentin, Thomas

    2011-01-01

    intermediate expression levels of green fluorescent protein in Escherichia coli. At least in one case, no apparent effect on protein stability was observed, pointing to RNA level effects as the principal reason for the observed expression differences. Targeting a synonymous codon library to the 5' coding...

  3. Efficient construction of an inverted minimal H1 promoter driven siRNA expression cassette: facilitation of promoter and siRNA sequence exchange.

    Directory of Open Access Journals (Sweden)

    Hoorig Nassanian

    2007-08-01

    Full Text Available RNA interference (RNAi, mediated by small interfering RNA (siRNA, is an effective method used to silence gene expression at the post-transcriptional level. Upon introduction into target cells, siRNAs incorporate into the RNA-induced silencing complex (RISC. The antisense strand of the siRNA duplex then "guides" the RISC to the homologous mRNA, leading to target degradation and gene silencing. In recent years, various vector-based siRNA expression systems have been developed which utilize opposing polymerase III promoters to independently drive expression of the sense and antisense strands of the siRNA duplex from the same template.We show here the use of a ligase chain reaction (LCR to develop a new vector system called pInv-H1 in which a DNA sequence encoding a specific siRNA is placed between two inverted minimal human H1 promoters (approximately 100 bp each. Expression of functional siRNAs from this construct has led to efficient silencing of both reporter and endogenous genes. Furthermore, the inverted H1 promoter-siRNA expression cassette was used to generate a retrovirus vector capable of transducing and silencing expression of the targeted protein by>80% in target cells.The unique design of this construct allows for the efficient exchange of siRNA sequences by the directional cloning of short oligonucleotides via asymmetric restriction sites. This provides a convenient way to test the functionality of different siRNA sequences. Delivery of the siRNA cassette by retroviral transduction suggests that a single copy of the siRNA expression cassette efficiently knocks down gene expression at the protein level. We note that this vector system can potentially be used to generate a random siRNA library. The flexibility of the ligase chain reaction suggests that additional control elements can easily be introduced into this siRNA expression cassette.

  4. Gene mining a marama bean expressed sequence tags (ESTs ...

    African Journals Online (AJOL)

    The authors reported the identification of genes associated with embryonic development and microsatellite sequences. The future direction will entail characterization of these genes using gene over-expression and mutant assays. Key words: Namibia, simple sequence repeats (SSR), data mining, homology searches, ...

  5. Expressing Intellectual Freedom: A Content Analysis of Catholic Library World from 1980 to 2015

    Directory of Open Access Journals (Sweden)

    Megan E. Welsh

    2016-12-01

    Full Text Available Objective – Professional librarians have varying values relating to the topic of intellectual freedom that may or may not align with the American Library Association’s (ALA policies defining professional expectations on the topic. The personally held values and beliefs of Roman Catholic librarians and those working in libraries affiliated with Roman Catholicism are worthy of study to determine how personal religious values may translate into professional practice. The objective of this paper is to ascertain how frequently and in what context the topics of intellectual freedom and censorship were expressed in articles published in Catholic Library World (CLW, the professional journal of the Catholic Library Association (CLA from 1980 to 2015. Published content on these topics can be used as evidence to determine how this population discusses the concept of intellectual freedom. Methods – Articles relevant to these topics were retrieved from the American Theological Library Association Catholic Periodical and Literature Index (ATLA CPLI and Library, Information Science & Technology Abstracts (LISTA databases by conducting keyword searches using the terms “intellectual freedom” and censorship. Each retrieved publication was analyzed by counting the number of times the phrase “intellectual freedom” and the root censor* occurred. Through a deep reading of each publication, statements containing these search terms were then coded as positive, negative, or neutral, establishing a context for each occurrence. Results – The majority of published content supported intellectual freedom and opposed censorship. Negative content typically occurred in publications about children or school libraries. Additionally, CLW contributors did express a certain level of conflict between personally held religious values and professional values. Conclusions – This study adds to the limited research available on the intersection of personally held

  6. Isolation and expression of the Pneumocystis carinii thymidylate synthase gene

    DEFF Research Database (Denmark)

    Edman, U; Edman, J C; Lundgren, B

    1989-01-01

    The thymidylate synthase (TS) gene from Pneumocystis carinii has been isolated from complementary and genomic DNA libraries and expressed in Escherichia coli. The coding sequence of TS is 891 nucleotides, encoding a 297-amino acid protein of Mr 34,269. The deduced amino acid sequence is similar...

  7. Cloning, sequencing, and expression of cDNA for human β-glucuronidase

    International Nuclear Information System (INIS)

    Oshima, A.; Kyle, J.W.; Miller, R.D.

    1987-01-01

    The authors report here the cDNA sequence for human placental β-glucuronidase (β-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH 2 -terminal amino acid sequence determined for human spleen β-glucuronidase agreed with that inferred from the DNA sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human β-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human β-glucuronidase, demonstrate the existence of two populations of mRNA for β-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length

  8. SeqAn An efficient, generic C++ library for sequence analysis

    Directory of Open Access Journals (Sweden)

    Rausch Tobias

    2008-01-01

    Full Text Available Abstract Background The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome 1 would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. Results To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use. Conclusion We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.

  9. Expression of Kirsten murine sarcoma virus sequences in Beagle dog tissues

    International Nuclear Information System (INIS)

    Kerkof, P.R.; Kelly, G.

    1988-01-01

    Labeled cDNA synthesized from RNA extracted from 238 PuO 2 -, 239 PuO 2 -, and 90 Sr-induced lung tumors in Beagle dogs, from nontumor tissue from 239 PuO 2 -exposed dogs, and from unexposed dog lung and liver tissue produces strong hybridization signals with a plasmid (pKSma) that contains Kirsten murine sarcoma virus (KMSV) sequences. At least 90 percent of the KMSV sequences are expressed in these dog tissues, including sequences corresponding to p21 K-ras, qp70 envelope glycoprotein, and at least one other proviral sequence. The expression of Kirsten ras and other sarcoma virus sequences may have important implications for the interpretation of carcinogenesis studies in these dogs. (author)

  10. Molecular cloning, nucleotide sequence, and expression of the gene encoding human eosinophil differentiation factor (interleukin 5)

    International Nuclear Information System (INIS)

    Campbell, H.D.; Tucker, W.Q.J.; Hort, Y.; Martinson, M.E.; Mayo, G.; Clutterbuck, E.J.; Sanderson, C.J.; Young, I.G.

    1987-01-01

    The human eosinophil differentiation factor (EDF) gene was cloned from a genomic library in λ phage EMBL3A by using a murine EDF cDNA clone as a probe. The DNA sequence of a 3.2-kilobase BamHI fragment spanning the gene was determined. The gene contains three introns. The predicted amino acid sequence of 134 amino acids is identical with that recently reported for human interleukin 5 but shows no significant homology with other known hemopoietic growth regulators. The amino acid sequence shows strong homology (∼ 70% identity) with that of murine EDF. Recombinant human EDF, expressed from the human EDF gene after transfection into monkey COS cells, stimulated the production of eosinophils and eosinophil colonies from normal human bone marrow but had no effect on the production of neutrophils or mononuclear cells (monocytes and lymphoid cells). The apparent specificity of human EDF for the eosinophil lineage in myeloid hemopoiesis contrasts with the properties of human interleukin 3 and granulocyte/macrophage and granulocyte colony-stimulating factors but is directly analogous to the biological properties of murine EDF. Human EDF therefore represents a distinct hemopoietic growth factor that could play a central role in the regulation of eosinophilia

  11. Improving RNA-Seq expression estimates by correcting for fragment bias

    Science.gov (United States)

    2011-01-01

    The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies. PMID:21410973

  12. Methods for small RNA preparation for digital gene expression profiling by next-generation sequencing

    NARCIS (Netherlands)

    Linsen, S.E.V.; Cuppen, E.

    2012-01-01

    Digital gene expression (DGE) profiling techniques are playing an eminent role in the detection, localization, and differential expression quantification of many small RNA species, including microRNAs (1-3). Procedures in small RNA library preparation techniques typically include adapter ligation by

  13. Expression profiling and comparative sequence derived insights into lipid metabolism

    Energy Technology Data Exchange (ETDEWEB)

    Callow, Matthew J.; Rubin, Edward M.

    2001-12-19

    Expression profiling and genomic DNA sequence comparisons are increasingly being applied to the identification and analysis of the genes involved in lipid metabolism. Not only has genome-wide expression profiling aided in the identification of novel genes involved in important processes in lipid metabolism such as sterol efflux, but the utilization of information from these studies has added to our understanding of the regulation of pathways participating in the process. Coupled with these gene expression studies, cross species comparison, searching for sequences conserved through evolution, has proven to be a powerful tool to identify important non-coding regulatory sequences as well as the discovery of novel genes relevant to lipid biology. An example of the value of this approach was the recent chance discovery of a new apolipoprotein gene (apo AV) that has dramatic effects upon triglyceride metabolism in mice and humans.

  14. Analysis of expressed sequence tags from Citrus sinensis L. Osbeck infected with Xylella fastidiosa

    Directory of Open Access Journals (Sweden)

    Alessandra A. de Souza

    2007-01-01

    Full Text Available In order to understand the genetic responses resulting from physiological changes that occur in plants displaying citrus variegated chlorosis (CVC symptoms, we adopted a strategy of comparing two EST libraries from sweet orange [Citrus sinensis (L. Osbeck]. One of them was prepared with plants showing typical CVC symptoms caused by Xylella fastidiosa and the other with non-inoculated plants. We obtained 15,944 ESTs by sequencing the two cDNA libraries. Using an in silico hybridization strategy, 37 genes were found to have significant variation at the transcriptional level. Within this subset, 21 were up-regulated and 16 were down-regulated in plants with CVC. The main functional categories of the down-regulated transcripts in plants with CVC were associated with metabolism, protein modification, energy and transport facilitation. The majority of the up-regulated transcripts were associated with metabolism and defense response. Some transcripts associated with adaptation to stress conditions were up-regulated in plants with CVC and could explain why plants remain alive even under severe water and nutritional stress. Others of the up-regulated transcripts are related to defense response suggesting that sweet orange plants activate their defense machinery. The genes associated with stress response might be expressed as part of a secondary response related to physiological alterations caused by the infection.

  15. Stability-Diversity Tradeoffs Impose Fundamental Constraints on Selection of Synthetic Human VH/VL Single-Domain Antibodies from In Vitro Display Libraries.

    Science.gov (United States)

    Henry, Kevin A; Kim, Dae Young; Kandalaft, Hiba; Lowden, Michael J; Yang, Qingling; Schrag, Joseph D; Hussack, Greg; MacKenzie, C Roger; Tanha, Jamshid

    2017-01-01

    Human autonomous V H /V L single-domain antibodies (sdAbs) are attractive therapeutic molecules, but often suffer from suboptimal stability, solubility and affinity for cognate antigens. Most commonly, human sdAbs have been isolated from in vitro display libraries constructed via synthetic randomization of rearranged V H /V L domains. Here, we describe the design and characterization of three novel human V H /V L sdAb libraries through a process of: (i) exhaustive biophysical characterization of 20 potential V H /V L sdAb library scaffolds, including assessment of expression yield, aggregation resistance, thermostability and tolerance to complementarity-determining region (CDR) substitutions; (ii) in vitro randomization of the CDRs of three V H /V L sdAb scaffolds, with tailored amino acid representation designed to promote solubility and expressibility; and (iii) systematic benchmarking of the three V H /V L libraries by panning against five model antigens. We isolated ≥1 antigen-specific human sdAb against four of five targets (13 V H s and 7 V L s in total); these were predominantly monomeric, had antigen-binding affinities ranging from 5 nM to 12 µM (average: 2-3 µM), but had highly variable expression yields (range: 0.1-19 mg/L). Despite our efforts to identify the most stable V H /V L scaffolds, selection of antigen-specific binders from these libraries was unpredictable (overall success rate for all library-target screens: ~53%) with a high attrition rate of sdAbs exhibiting false positive binding by ELISA. By analyzing V H /V L sdAb library sequence composition following selection for monomeric antibody expression (binding to protein A/L followed by amplification in bacterial cells), we found that some V H /V L sdAbs had marked growth advantages over others, and that the amino acid composition of the CDRs of this set of sdAbs was dramatically restricted (bias toward Asp and His and away from aromatic and hydrophobic residues). Thus, CDR sequence

  16. Expression of Kirsten murine sarcoma virus sequences in Beagle dog tissues

    Energy Technology Data Exchange (ETDEWEB)

    Kerkof, P R; Kelly, G

    1988-12-01

    Labeled cDNA synthesized from RNA extracted from {sup 238}PuO{sub 2}-, {sup 239}PuO{sub 2}-, and {sup 90}Sr-induced lung tumors in Beagle dogs, from nontumor tissue from {sup 239}PuO{sub 2}-exposed dogs, and from unexposed dog lung and liver tissue produces strong hybridization signals with a plasmid (pKSma) that contains Kirsten murine sarcoma virus (KMSV) sequences. At least 90 percent of the KMSV sequences are expressed in these dog tissues, including sequences corresponding to p21 K-ras, qp70 envelope glycoprotein, and at least one other proviral sequence. The expression of Kirsten ras and other sarcoma virus sequences may have important implications for the interpretation of carcinogenesis studies in these dogs. (author)

  17. SSH analysis of endosperm transcripts and characterization of heat stress regulated expressed sequence tags in bread wheat

    Directory of Open Access Journals (Sweden)

    Suneha Goswami

    2016-08-01

    Full Text Available Heat stress is one of the major problems in agriculturally important cereal crops, especially wheat. Here, we have constructed a subtracted cDNA library from the endosperm of HS-treated (42°C for 2 h wheat cv. HD2985 by suppression subtractive hybridization (SSH. We identified ~550 recombinant clones ranging from 200 to 500 bp with an average size of 300 bp. Sanger’s sequencing was performed with 205 positive clones to generate the differentially expressed sequence tags (ESTs. Most of the ESTs were observed to be localized on the long arm of chromosome 2A and associated with heat stress tolerance and metabolic pathways. Identified ESTs were BLAST search using Ensemble, TriFLD and TIGR databases and the predicted CDS were translated and aligned with the protein sequences available in pfam and InterProScan 5 databases to predict the differentially expressed proteins (DEPs. We observed eight different types of post-translational modifications (PTMs in the DEPs corresponds to the cloned ESTs—147 sites with phosphorylation, 21 sites with sumoylation, 237 with palmitoylation, 96 sites with S-nitrosylation, 3066 calpain cleavage sites, and 103 tyrosine nitration sites, predicted to sense the heat stress and regulate the expression of stress genes. Twelve DEPs were observed to have transmembrane helixes (TMH in their structure, predicted to play the role of sensors of HS. Quantitative Real-Time PCR of randomly selected ESTs showed very high relative expression of HSP17 under HS; up-regulation was observed more in wheat cv. HD2985 (thermotolerant, as compared to HD2329 (thermosusceptible during grain-filling. The abundance of transcripts was further validated through northern blot analysis. The ESTs and their corresponding DEPs can be used as molecular marker for screening or targeted precision breeding program. PTMs identified in the DEPs can be used to elucidate the thermotolerance mechanism of wheat – a novel step towards the development of

  18. Promoter Boundaries for the luxCDABE and betIBA-proXWV Operons in Vibrio harveyi Defined by the Method Rapid Arbitrary PCR Insertion Libraries (RAIL).

    Science.gov (United States)

    Hustmyer, Christine M; Simpson, Chelsea A; Olney, Stephen G; Rusch, Douglas B; Bochman, Matthew L; van Kessel, Julia C

    2018-06-01

    Experimental studies of transcriptional regulation in bacteria require the ability to precisely measure changes in gene expression, often accomplished through the use of reporter genes. However, the boundaries of promoter sequences required for transcription are often unknown, thus complicating the construction of reporters and genetic analysis of transcriptional regulation. Here, we analyze reporter libraries to define the promoter boundaries of the luxCDABE bioluminescence operon and the betIBA-proXWV osmotic stress operon in Vibrio harveyi We describe a new method called r apid a rbitrary PCR i nsertion l ibraries (RAIL) that combines the power of arbitrary PCR and isothermal DNA assembly to rapidly clone promoter fragments of various lengths upstream of reporter genes to generate large libraries. To demonstrate the versatility and efficiency of RAIL, we analyzed the promoters driving expression of the luxCDABE and betIBA-proXWV operons and created libraries of DNA fragments from these loci fused to fluorescent reporters. Using flow cytometry sorting and deep sequencing, we identified the DNA regions necessary and sufficient for maximum gene expression for each promoter. These analyses uncovered previously unknown regulatory sequences and validated known transcription factor binding sites. We applied this high-throughput method to gfp , mCherry , and lacZ reporters and multiple promoters in V. harveyi We anticipate that the RAIL method will be easily applicable to other model systems for genetic, molecular, and cell biological applications. IMPORTANCE Gene reporter constructs have long been essential tools for studying gene regulation in bacteria, particularly following the recent advent of fluorescent gene reporters. We developed a new method that enables efficient construction of promoter fusions to reporter genes to study gene regulation. We demonstrate the versatility of this technique in the model bacterium Vibrio harveyi by constructing promoter libraries

  19. A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform.

    Science.gov (United States)

    de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M

    2017-07-06

    Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length

  20. Generation and annotation of lodgepole pine and oleoresin-induced expressed sequences from the blue-stain fungus Ophiostoma clavigerum, a Mountain Pine Beetle-associated pathogen.

    Science.gov (United States)

    DiGuistini, Scott; Ralph, Steven G; Lim, Young W; Holt, Robert; Jones, Steven; Bohlmann, Jörg; Breuil, Colette

    2007-02-01

    Ophiostoma clavigerum is a destructive pathogen of lodgepole pine (Pinus contorta) forests in western North America. It is therefore a relevant system for a genomics analysis of fungi vectored by bark beetles. To begin characterizing molecular interactions between the pathogen and its conifer host, we created an expressed sequence tag (EST) collection for O. clavigerum. Lodgepole pine sawdust and oleoresin media were selected to stimulate gene expression that would be specific to this host interaction. Over 6500 cDNA clones, derived from four normalized cDNA libraries, were single-pass sequenced from the 3' end. After quality screening, we identified 5975 high-quality reads with an average PHRED 20 of greater than 750 bp. Clustering and assembly of this high-quality EST set resulted in the identification of 2620 unique putative transcripts. BLASTX analysis revealed that only 67% of these unique transcripts could be matched to known or predicted protein sequences in public databases. Functional classification of these sequences provided initial insights into the transcriptome of O. clavigerum. Of particular interest, our ESTs represent an extensive collection of cytochrome P450 s, ATP-binding-cassette-type transporters and genes involved in 1,8-dihydroxynaphthalene-melanin biosynthesis. These results are discussed in the context of detoxification of conifer oleoresins and fungal pathogenesis.

  1. Gene expression analysis of flax seed development

    Science.gov (United States)

    2011-01-01

    Background Flax, Linum usitatissimum L., is an important crop whose seed oil and stem fiber have multiple industrial applications. Flax seeds are also well-known for their nutritional attributes, viz., omega-3 fatty acids in the oil and lignans and mucilage from the seed coat. In spite of the importance of this crop, there are few molecular resources that can be utilized toward improving seed traits. Here, we describe flax embryo and seed development and generation of comprehensive genomic resources for the flax seed. Results We describe a large-scale generation and analysis of expressed sequences in various tissues. Collectively, the 13 libraries we have used provide a broad representation of genes active in developing embryos (globular, heart, torpedo, cotyledon and mature stages) seed coats (globular and torpedo stages) and endosperm (pooled globular to torpedo stages) and genes expressed in flowers, etiolated seedlings, leaves, and stem tissue. A total of 261,272 expressed sequence tags (EST) (GenBank accessions LIBEST_026995 to LIBEST_027011) were generated. These EST libraries included transcription factor genes that are typically expressed at low levels, indicating that the depth is adequate for in silico expression analysis. Assembly of the ESTs resulted in 30,640 unigenes and 82% of these could be identified on the basis of homology to known and hypothetical genes from other plants. When compared with fully sequenced plant genomes, the flax unigenes resembled poplar and castor bean more than grape, sorghum, rice or Arabidopsis. Nearly one-fifth of these (5,152) had no homologs in sequences reported for any organism, suggesting that this category represents genes that are likely unique to flax. Digital analyses revealed gene expression dynamics for the biosynthesis of a number of important seed constituents during seed development. Conclusions We have developed a foundational database of expressed sequences and collection of plasmid clones that comprise

  2. Gene expression analysis of flax seed development

    Directory of Open Access Journals (Sweden)

    Sharpe Andrew

    2011-04-01

    Full Text Available Abstract Background Flax, Linum usitatissimum L., is an important crop whose seed oil and stem fiber have multiple industrial applications. Flax seeds are also well-known for their nutritional attributes, viz., omega-3 fatty acids in the oil and lignans and mucilage from the seed coat. In spite of the importance of this crop, there are few molecular resources that can be utilized toward improving seed traits. Here, we describe flax embryo and seed development and generation of comprehensive genomic resources for the flax seed. Results We describe a large-scale generation and analysis of expressed sequences in various tissues. Collectively, the 13 libraries we have used provide a broad representation of genes active in developing embryos (globular, heart, torpedo, cotyledon and mature stages seed coats (globular and torpedo stages and endosperm (pooled globular to torpedo stages and genes expressed in flowers, etiolated seedlings, leaves, and stem tissue. A total of 261,272 expressed sequence tags (EST (GenBank accessions LIBEST_026995 to LIBEST_027011 were generated. These EST libraries included transcription factor genes that are typically expressed at low levels, indicating that the depth is adequate for in silico expression analysis. Assembly of the ESTs resulted in 30,640 unigenes and 82% of these could be identified on the basis of homology to known and hypothetical genes from other plants. When compared with fully sequenced plant genomes, the flax unigenes resembled poplar and castor bean more than grape, sorghum, rice or Arabidopsis. Nearly one-fifth of these (5,152 had no homologs in sequences reported for any organism, suggesting that this category represents genes that are likely unique to flax. Digital analyses revealed gene expression dynamics for the biosynthesis of a number of important seed constituents during seed development. Conclusions We have developed a foundational database of expressed sequences and collection of plasmid

  3. Novel expressed sequences identified in a model of androgen independent prostate cancer

    Directory of Open Access Journals (Sweden)

    Jones Steven JM

    2007-01-01

    Full Text Available Abstract Background Prostate cancer is the most frequently diagnosed cancer in American men, and few effective treatment options are available to patients who develop hormone-refractory prostate cancer. The molecular changes that occur to allow prostate cells to proliferate in the absence of androgens are not fully understood. Results Subtractive hybridization experiments performed with samples from an in vivo model of hormonal progression identified 25 expressed sequences representing novel human transcripts. Intriguingly, these 25 sequences have small open-reading frames and are not highly conserved through evolution, suggesting many of these novel expressed sequences may be derived from untranslated regions of novel transcripts or from non-coding transcripts. Examination of a large metalibrary of human Serial Analysis of Gene Expression (SAGE tags demonstrated that only three of these novel sequences had been previously detected. RT-PCR experiments confirmed that the 6 sequences tested were expressed in specific human tissues, as well as in clinical samples of prostate cancer. Further RT-PCR experiments for five of these fragments indicated they originated from large untranslated regions of unannotated transcripts. Conclusion This study underlines the value of using complementary techniques in the annotation of the human genome. The tissue-specific expression of 4 of the 6 clones tested indicates the expression of these novel transcripts is tightly regulated, and future work will determine the possible role(s these novel transcripts may play in the progression of prostate cancer.

  4. Rapid Multiplex Small DNA Sequencing on the MinION Nanopore Sequencing Platform

    Directory of Open Access Journals (Sweden)

    Shan Wei

    2018-05-01

    Full Text Available Real-time sequencing of short DNA reads has a wide variety of clinical and research applications including screening for mutations, target sequences and aneuploidy. We recently demonstrated that MinION, a nanopore-based DNA sequencing device the size of a USB drive, could be used for short-read DNA sequencing. In this study, an ultra-rapid multiplex library preparation and sequencing method for the MinION is presented and applied to accurately test normal diploid and aneuploidy samples’ genomic DNA in under three hours, including library preparation and sequencing. This novel method shows great promise as a clinical diagnostic test for applications requiring rapid short-read DNA sequencing.

  5. Improvement of methods for large scale sequencing; application to human Xq28

    Energy Technology Data Exchange (ETDEWEB)

    Gibbs, R.A.; Andersson, B.; Wentland, M.A. [Baylor College of Medicine, Houston, TX (United States)] [and others

    1994-09-01

    Sequencing of a one-metabase region of Xq28, spanning the FRAXA and IDS loci has been undertaken in order to investigate the practicality of the shotgun approach for large scale sequencing and as a platform to develop improved methods. The efficiency of several steps in the shotgun sequencing strategy has been increased using PCR-based approaches. An improved method for preparation of M13 libraries has been developed. This protocol combines a previously described adaptor-based protocol with the uracil DNA glycosylase (UDG)-cloning procedure. The efficiency of this procedure has been found to be up to 100-fold higher than that of previously used protocols. In addition the novel protocol is more reliable and thus easy to establish in a laboratory. The method has also been adapted for the simultaneous shotgun sequencing of multiple short fragments by concentrating them before library construction is presented. This protocol is suitable for rapid characterization of cDNA clones. A library was constructed from 15 PCR-amplified and concentrated human cDNA inserts, and the insert sequences could easily be identified as separate contigs during the assembly process and the sequence coverage was even along each fragment. Using this strategy, the fine structures of the FraxA and IDS loci have been revealed and several EST homologies indicating novel expressed sequences have been identified. Use of PCR to close repetitive regions that are difficult to clone was tested by determination of the sequence of a cosmid mapping DXS455 in Xq28, containing a polymorphic VNTR. The region containing the VNTR was not represented in the shotgun library, but by designing PCR primers in the sequences flanking the gap and by cloning and sequencing the PCR product, the fine structure of the VNTR has been determined. It was found to be an AT-rich VNTR with a repeated 25-mer at the center.

  6. Identification of genes differentially expressed in association with acquired cisplatin resistance

    Science.gov (United States)

    Johnsson, A; Zeelenberg, I; Min, Y; Hilinski, J; Berry, C; Howell, S B; Los, G

    2000-01-01

    The goal of this study was to identify genes whose mRNA levels are differentially expressed in human cells with acquired cisplatin (cDDP) resistance. Using the parental UMSCC10b head and neck carcinoma cell line and the 5.9-fold cDDP-resistant subline, UMSCC10b/Pt-S15, two suppressive subtraction hybridization (SSH) cDNA libraries were prepared. One library represented mRNAs whose levels were increased in the cDDP resistant variant (the UP library), the other one represented mRNAs whose levels were decreased in the resistant cells (the DOWN library). Arrays constructed with inserts recovered from these libraries were hybridized with SSH products to identify truly differentially expressed elements. A total of 51 cDNA fragments present in the UP library and 16 in the DOWN library met the criteria established for differential expression. The sequences of 87% of these cDNA fragments were identified in Genbank. Among the mRNAs in the UP library that were frequently isolated and that showed high levels of differential expression were cytochrome oxidase I, ribosomal protein 28S, elongation factor 1α, α-enolase, stathmin, and HSP70. The approach taken in this study permitted identification of many genes never before linked to the cDDP-resistant phenotype. © 2000 Cancer Research Campaign PMID:10993653

  7. Development of library preparation method able to correct gene expression levels in rice anther and isolate a trace expression gene mediated in cold-resistance

    Energy Technology Data Exchange (ETDEWEB)

    Yamaguchi, Tomoya; Koike, Setsuo [Tohoku National Agricultural Experiment Station, Morioka (Japan)

    2000-02-01

    When cDNA library is prepared by a previously developed method, genes of which expression level is high are apt to be cloned at a high frequency, whereas genes of which expression level are low, are difficult to be cloned. A low-expression gene has been cloned at very low frequency. Therefore, the gene encoding the key enzyme that is involved in growth disturbance of rice pollen has not been identified. In this study, development of a library preparing method able to correct the expression level was attempted using highly sensitive detection method with radioisotope and some genes related to cold-resistance of rice were isolated. Double strand DNAs were synthesized using mRNA extract from rice anthers and annealed following heat-denaturation. It has been known that single strand DNA molecules abundantly existing in DNA solution can easily aggregate to form double strand DNA, but single stranded DNA molecules poor in the solution are apt to still remain as single strand after annealing. Thus, the amount of single strand DNA would be balanced in the solution between abundant DNA and poor DNA species. The authors succeeded to prepare a gene library including low and high expression genes at similar proportions. Moreover, spin trap method that allows RI labeling of DNA bound to latex particle, was developed to detect with high sensitivity, especially for genes that are expressed at low level. The present method could be used for recovery, detection and quantitative analysis of radiolabeled single strand DNA. Thus, it was demonstrated that the stage from tetrad sperm to small sperm might be easily affected by cold stress. The present results suggest that the expressions of {beta}-1 and {beta}-3 glucanase, which are involved in the release of small sperms following meiosis in the pollen formation, might be easily affected by cold stress. (M.N.)

  8. Development of library preparation method able to correct gene expression levels in rice anther and isolate a trace expression gene mediated in cold-resistance

    International Nuclear Information System (INIS)

    Yamaguchi, Tomoya; Koike, Setsuo

    2000-01-01

    When cDNA library is prepared by a previously developed method, genes of which expression level is high are apt to be cloned at a high frequency, whereas genes of which expression level are low, are difficult to be cloned. A low-expression gene has been cloned at very low frequency. Therefore, the gene encoding the key enzyme that is involved in growth disturbance of rice pollen has not been identified. In this study, development of a library preparing method able to correct the expression level was attempted using highly sensitive detection method with radioisotope and some genes related to cold-resistance of rice were isolated. Double strand DNAs were synthesized using mRNA extract from rice anthers and annealed following heat-denaturation. It has been known that single strand DNA molecules abundantly existing in DNA solution can easily aggregate to form double strand DNA, but single stranded DNA molecules poor in the solution are apt to still remain as single strand after annealing. Thus, the amount of single strand DNA would be balanced in the solution between abundant DNA and poor DNA species. The authors succeeded to prepare a gene library including low and high expression genes at similar proportions. Moreover, spin trap method that allows RI labeling of DNA bound to latex particle, was developed to detect with high sensitivity, especially for genes that are expressed at low level. The present method could be used for recovery, detection and quantitative analysis of radiolabeled single strand DNA. Thus, it was demonstrated that the stage from tetrad sperm to small sperm might be easily affected by cold stress. The present results suggest that the expressions of β-1 and β-3 glucanase, which are involved in the release of small sperms following meiosis in the pollen formation, might be easily affected by cold stress. (M.N.)

  9. Comparative analysis of differentially expressed sequence tags of sweet orange and mandarin infected with Xylella fastidiosa

    Directory of Open Access Journals (Sweden)

    Alessandra A. de Souza

    2007-01-01

    Full Text Available The Citrus ESTs Sequencing Project (CitEST conducted at Centro APTA Citros Sylvio Moreira/IAC has identified and catalogued ESTs representing a set of citrus genes expressed under relevant stress responses, including diseases such as citrus variegated chlorosis (CVC, caused by Xylella fastidiosa. All sweet orange (Citrus sinensis L. Osb. varieties are susceptible to X. fastidiosa. On the other hand, mandarins (C. reticulata Blanco are considered tolerant or resistant to the disease, although the bacterium can be sporadically detected within the trees, but no disease symptoms or economic losses are observed. To study their genetic responses to the presence of X. fastidiosa, we have compared EST libraries of leaf tissue of sweet orange Pêra IAC (highly susceptible cultivar to X. fastidiosa and mandarin ‘Ponkan’ (tolerant artificially infected with the bacterium. Using an in silico differential display, 172 genes were found to be significantly differentially expressed in such conditions. Sweet orange presented an increase in expression of photosynthesis related genes that could reveal a strategy to counterbalance a possible lower photosynthetic activity resulting from early effects of the bacterial colonization in affected plants. On the other hand, mandarin showed an active multi-component defense response against the bacterium similar to the non-host resistance pattern.

  10. RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.

    Science.gov (United States)

    Brule, C E; Dean, K M; Grayhack, E J

    2016-01-01

    The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.

  11. A review of recommendations for sequencing receptive and expressive language instruction.

    Science.gov (United States)

    Petursdottir, Anna Ingeborg; Carr, James E

    2011-01-01

    We review recommendations for sequencing instruction in receptive and expressive language objectives in early and intensive behavioral intervention (EIBI) programs. Several books recommend completing receptive protocols before introducing corresponding expressive protocols. However, this recommendation has little empirical support, and some evidence exists that the reverse sequence may be more efficient. Alternative recommendations include teaching receptive and expressive skills simultaneously (M. L. Sundberg & Partington, 1998) and building learning histories that lead to acquisition of receptive and expressive skills without direct instruction (Greer & Ross, 2008). Empirical support for these recommendations also is limited. Future research should assess the relative efficiency of receptive-before-expressive, expressive-before-receptive, and simultaneous training with children who have diagnoses of autism spectrum disorders. In addition, further evaluation is needed of the potential benefits of multiple-exemplar training and other variables that may influence the efficiency of receptive and expressive instruction.

  12. Experimental study of gene expression in lung and bronchus of radon-exposed mice

    International Nuclear Information System (INIS)

    Guo Zhiying; Tian Mei; Liu Jianxiang; Ruan Jianlei; Piao Chunnan; Su Xu

    2008-01-01

    Objective: To construct and identify differentially expressed cDNA library in lung and bronchus of mice exposed to radon. Methods: 2 week old, weighing (18-22)g, male BALB/c mice were placed in a SR-NIM02 radon chamber. One group of mice was exposed to radon, which was equivalent to the accumulative dose of 30 WLM. The control group was about 0.02 WLM. To construct a subtracted cDNA library enriched with differentially expressed genes, the Super SMART technique and the suppression subtractive hybridization (SSH) were performed. The obtained forward and reverse cDNA fragments were directly inserted into pGEM-T-easy vector and transformed into E. coli DH5α. The inserts in plasmid were amplified by nested polymerase chain reaction (PCR), and some of which were sequenced. In the end these sequences were BLASTed with GeneBank. Results: 146 of 460 clones obtained randomly were positive clones contained (1000-1500)bp inserted cDNA fragments. The forward and reverse subtracted cDNA library in lung and bronchus of mice exposed to radon was constructed, and 48 up-regulation and 61 down-regulation cDNA sequences selected were homologous with GeneBank in different extent. Conclusions: The subtracted cDNA library in lung and bronchus of mice exposed to radon is successfully constructed, and genes that differentially expressed are identified. Some genes might have relation with the immunity, cell cycle and apoptosis. (authors)

  13. Students lead the library the importance of student contributions to the academic library

    CERN Document Server

    Arnold-Garza, Sara

    2017-01-01

    In six parts-Students as Employees, Students as Curators, Students as Ambassadors, the Library as Client, Student Groups as Library Leaders, and Students as Library Designers-Students Lead the Library provides case studies of programs and initiatives that seek student input, assistance, and leadership in the academic library. Through the library, students can develop leadership skills, cultivate high levels of engagement, and offer peer learning opportunities. Through the students, libraries can create participatory design processes, enhancement and transformation of the library's core functions, and expressed library value for stakeholders.

  14. An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.

    Science.gov (United States)

    Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit

    2016-05-26

    Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.

  15. Methods for transforming and expression screening of filamentous fungal cells with a DNA library

    Science.gov (United States)

    Teter, Sarah; Lamsa, Michael; Cherry, Joel; Ward, Connie

    2015-06-02

    The present invention relates to methods for expression screening of filamentous fungal transformants, comprising: (a) isolating single colony transformants of a DNA library introduced into E. coli; (b) preparing DNA from each of the single colony E. coli transformants; (c) introducing a sample of each of the DNA preparations of step (b) into separate suspensions of protoplasts of a filamentous fungus to obtain transformants thereof, wherein each transformant contains one or more copies of an individual polynucleotide from the DNA library; (d) growing the individual filamentous fungal transformants of step (c) on selective growth medium, thereby permitting growth of the filamentous fungal transformants, while suppressing growth of untransformed filamentous fungi; and (e) measuring activity or a property of each polypeptide encoded by the individual polynucleotides. The present invention also relates to isolated polynucleotides encoding polypeptides of interest obtained by such methods, to nucleic acid constructs, expression vectors, and recombinant host cells comprising the isolated polynucleotides, and to methods of producing the polypeptides encoded by the isolated polynucleotides.

  16. Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna

    Directory of Open Access Journals (Sweden)

    Souche Erika L

    2011-06-01

    Full Text Available Abstract Background Daphnia (Crustacea: Cladocera plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. Daphnia magna is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, climate change, and antagonistic interactions with predators and parasites, which have all been demonstrated to induce micro-evolutionary and adaptive responses. Although its ecology and evolutionary biology is intensively studied, little is known on the functional genomics underpinning of phenotypic responses to environmental stressors. The aim of the present study was to find genes expressed in presence of environmental stressors, and target such genes for single nucleotide polymorphic (SNP marker development. Results We developed three expressed sequence tag (EST libraries using clonal lineages of D. magna exposed to ecological stressors, namely fish predation, parasite infection and pesticide exposure. We used these newly developed ESTs and other Daphnia ESTs retrieved from NCBI GeneBank to mine for SNP markers targeting synonymous as well as non synonymous genetic variation. We validate the developed SNPs in six natural populations of D. magna distributed at regional scale. Conclusions A large proportion (47% of the produced ESTs are Daphnia lineage specific genes, which are potentially involved in responses to environmental stress rather than to general cellular functions and metabolic activities, or reflect the arthropod's aquatic lifestyle. The characterization of genes expressed under stress and the validation of their SNPs for population genetic study is important for identifying ecologically responsive genes in D. magna.

  17. Synthesis of hexahydropyrrolo[2,1-a]isoquinoline compound libraries through a Pictet–Spengler cyclization/metal-catalyzed cross coupling/amidation sequence

    DEFF Research Database (Denmark)

    Petersen, Rico; Cohrt, A. Emil; Petersen, Michael Åxman

    2015-01-01

    incorporating two handles for diversification, were synthesized through an oxidative cleavage/Pictet–Spengler reaction sequence in high overall yields. A subsequent metal-catalyzed cross coupling/amidation protocol was developed and its utility in library synthesis was validated by construction of a 20-membered...

  18. Microarrays for global expression constructed with a low redundancy set of 27,500 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant

    Directory of Open Access Journals (Sweden)

    Retzel Ernest

    2004-09-01

    Full Text Available Abstract Background Microarrays are an important tool with which to examine coordinated gene expression. Soybean (Glycine max is one of the most economically valuable crop species in the world food supply. In order to accelerate both gene discovery as well as hypothesis-driven research in soybean, global expression resources needed to be developed. The applications of microarray for determining patterns of expression in different tissues or during conditional treatments by dual labeling of the mRNAs are unlimited. In addition, discovery of the molecular basis of traits through examination of naturally occurring variation in hundreds of mutant lines could be enhanced by the construction and use of soybean cDNA microarrays. Results We report the construction and analysis of a low redundancy 'unigene' set of 27,513 clones that represent a variety of soybean cDNA libraries made from a wide array of source tissue and organ systems, developmental stages, and stress or pathogen-challenged plants. The set was assembled from the 5' sequence data of the cDNA clones using cluster analysis programs. The selected clones were then physically reracked and sequenced at the 3' end. In order to increase gene discovery from immature cotyledon libraries that contain abundant mRNAs representing storage protein gene families, we utilized a high density filter normalization approach to preferentially select more weakly expressed cDNAs. All 27,513 cDNA inserts were amplified by polymerase chain reaction. The amplified products, along with some repetitively spotted control or 'choice' clones, were used to produce three 9,728-element microarrays that have been used to examine tissue specific gene expression and global expression in mutant isolines. Conclusions Global expression studies will be greatly aided by the availability of the sequence-validated and low redundancy cDNA sets described in this report. These cDNAs and ESTs represent a wide array of developmental

  19. In silico differential display of defense-related expressed sequence tags from sugarcane tissues infected with diazotrophic endophytes

    Directory of Open Access Journals (Sweden)

    Lambais Marcio R.

    2001-01-01

    Full Text Available The expression patterns of 277 sugarcane expressed sequence tags (EST-contigs encoding putative defense-related (DR proteins were evaluated using the Sugarcane EST database. The DR proteins evaluated included chitinases, beta-1,3-glucanases, phenylalanine ammonia-lyases, chalcone synthases, chalcone isomerases, isoflavone reductases, hydroxyproline-rich glycoproteins, proline-rich glycoproteins, peroxidases, catalases, superoxide dismutases, WRKY-like transcription factors and proteins involved in cell death control. Putative sugarcane WRKY proteins were compared and their phylogenetic relationships determined. A hierarchical clustering approach was used to identify DR ESTs with similar expression profiles in representative cDNA libraries. To identify DR ESTs differentially expressed in sugarcane tissues infected with Gluconacetobacter diazotrophicus or Herbaspirillum rubrisubalbicans, 179 putative DR EST-contigs expressed in non-infected tissues (leaves and roots and/or infected tissues were selected and arrayed by similarity of their expression profiles. Changes in the expression levels of 124 putative DR EST-contigs, expressed in non-infected tissues, were evaluated in infected tissues. Approximately 42% of these EST-contigs showed no expression in infected tissues, whereas 15% and 3% showed more than 2-fold suppression in tissues infected with G. diazotrophicus or H. rubrisubalbicans, respectively. Approximately 14 and 8% of the DR EST-contigs evaluated showed more than 2-fold induction in tissues infected with G. diazotrophicus or H. rubrisubalbicans, respectively. The differential expression of clusters of DR genes may be important in the establishment of a compatible interaction between sugarcane and diazotrophic endophytes. It is suggested that the hierarchical clustering approach can be used on a genome-wide scale to identify genes likely involved in controlling plant-microorganism interactions.

  20. Phage display peptide libraries: deviations from randomness and correctives

    Science.gov (United States)

    Ryvkin, Arie; Ashkenazy, Haim; Weiss-Ottolenghi, Yael; Piller, Chen; Pupko, Tal; Gershoni, Jonathan M

    2018-01-01

    Abstract Peptide-expressing phage display libraries are widely used for the interrogation of antibodies. Affinity selected peptides are then analyzed to discover epitope mimetics, or are subjected to computational algorithms for epitope prediction. A critical assumption for these applications is the random representation of amino acids in the initial naïve peptide library. In a previous study, we implemented next generation sequencing to evaluate a naïve library and discovered severe deviations from randomness in UAG codon over-representation as well as in high G phosphoramidite abundance causing amino acid distribution biases. In this study, we demonstrate that the UAG over-representation can be attributed to the burden imposed on the phage upon the assembly of the recombinant Protein 8 subunits. This was corrected by constructing the libraries using supE44-containing bacteria which suppress the UAG driven abortive termination. We also demonstrate that the overabundance of G stems from variant synthesis-efficiency and can be corrected using compensating oligonucleotide-mixtures calibrated by mass spectroscopy. Construction of libraries implementing these correctives results in markedly improved libraries that display random distribution of amino acids, thus ensuring that enriched peptides obtained in biopanning represent a genuine selection event, a fundamental assumption for phage display applications. PMID:29420788

  1. Microcystin-LR nanobody screening from an alpaca phage display nanobody library and its expression and application.

    Science.gov (United States)

    Xu, Chongxin; Yang, Ying; Liu, Liwen; Li, Jianhong; Liu, Xiaoqin; Zhang, Xiao; Liu, Yuan; Zhang, Cunzheng; Liu, Xianjin

    2018-04-30

    Microcystin-LR (MC-LR) is a type of biotoxin that pollutes the ecological environment and food. The study aimed to obtain new nanobodies from phage nanobody library for determination of MC-LR. The toxin was conjugated to keyhole limpet haemocyanin (KLH) and bovine serum albumin (BSA), respectively, then the conjugates were used as coated antigens for enrichment (coated MC-LR-KLH) and screening (coated MC-LR-BSA) of MC-LR phage nanobodies from an alpaca phage display nanobody library. The antigen-specific phage particles were enriched effectively with four rounds of biopanning. At the last round of enrichment, total 20 positive monoclonal phage nanobodies were obtained from the library, which were analyzed after monoclonal phage enzyme linked immunosorbent assay (ELISA), colony PCR and DNA sequencing. The most three positive nanobody genes, ANAb12, ANAb9 and ANAb7 were cloned into pET26b vector, then the nanobodies were expressed in Escherichia coli BL21 respectively. After being purified, the molecular weight (M.W.) of all nanobodies were approximate 15kDa with sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The purified nanobodies, ANAb12, ANAb9 and ANAb7 were used to establish the indirect competitive ELISA (IC-ELISA) for MC-LR, and their half-maximum inhibition concentrations (IC 50 ) were 0.87, 1.17 and 1.47μg/L, their detection limits (IC 10 ) were 0.06, 0.08 and 0.12μg/L, respectively. All of them showed strong cross-reactivity (CRs) of 82.7-116.9% for MC-RR, MC-YR and MC-WR, and weak CRs of less than 4.56% for MC-LW, less than 0.1% for MC-LY and MC-LF. It was found that all the IC-ELISAs for MC-LR spiked in tap water samples detection were with good accuracy, stability and repeatability, their recoveries were 84.0-106.5%, coefficient of variations (CVs) were 3.4-10.6%. These results showed that IC-ELISA based on the nanobodies from the alpaca phage display antibody library were promising for high sensitive determination of multiple

  2. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Science.gov (United States)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  3. Normalized cDNA libraries

    Science.gov (United States)

    Soares, Marcelo B.; Efstratiadis, Argiris

    1997-01-01

    This invention provides a method to normalize a directional cDNA library constructed in a vector that allows propagation in single-stranded circle form comprising: (a) propagating the directional cDNA library in single-stranded circles; (b) generating fragments complementary to the 3' noncoding sequence of the single-stranded circles in the library to produce partial duplexes; (c) purifying the partial duplexes; (d) melting and reassociating the purified partial duplexes to moderate Cot; and (e) purifying the unassociated single-stranded circles, thereby generating a normalized cDNA library.

  4. A novel method of providing a library of n-mers or biopolymers

    DEFF Research Database (Denmark)

    2012-01-01

    The present invention relates to a method of providing a library of n-mer sequences, wherein the library is composed of an n-mer sequence. Also the invention concerns a method of providing a library of biopolymer sequences having one or more n-mers in common. Further provided are specific primers...

  5. Sequence genomic organization and expression of two channel catfish Ictalurus punctatus Ghrelin receptors

    Science.gov (United States)

    Two ghrelin receptor (GHS-R) genes were isolated from channel catfish tissue and a bacterial artificial chromosome (BAC) library. The two receptors were characterized by determining tissue distribution, ontogeny of receptor mRNA expression, and effects of exogenous homologous ghrelin administration ...

  6. Allele Workbench: transcriptome pipeline and interactive graphics for allele-specific expression.

    Directory of Open Access Journals (Sweden)

    Carol A Soderlund

    Full Text Available Sequencing the transcriptome can answer various questions such as determining the transcripts expressed in a given species for a specific tissue or condition, evaluating differential expression, discovering variants, and evaluating allele-specific expression. Differential expression evaluates the expression differences between different strains, tissues, and conditions. Allele-specific expression evaluates expression differences between parental alleles. Both differential expression and allele-specific expression have been studied for heterosis (hybrid vigor, where the hybrid has improved performance over the parents for one or more traits. The Allele Workbench software was developed for a heterosis study that evaluated allele-specific expression for a mouse F1 hybrid using libraries from multiple tissues with biological replicates. This software has been made into a distributable package, which includes a pipeline, a Java interface to build the database, and a Java interface for query and display of the results. The required input is a reference genome, annotation file, and one or more RNA-Seq libraries with optional replicates. It evaluates allelic imbalance at the SNP and transcript level and flags transcripts with significant opposite directional allele-specific expression. The Java interface allows the user to view data from libraries, replicates, genes, transcripts, exons, and variants, including queries on allele imbalance for selected libraries. To determine the impact of allele-specific SNPs on protein folding, variants are annotated with their effect (e.g., missense, and the parental protein sequences may be exported for protein folding analysis. The Allele Workbench processing results in transcript files and read counts that can be used as input to the previously published Transcriptome Computational Workbench, which has a new algorithm for determining a trimmed set of gene ontology terms. The software with demo files is available

  7. DSAP: deep-sequencing small RNA analysis pipeline.

    Science.gov (United States)

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  8. Identification of miRNAs and their target genes in developing soybean seeds by deep sequencing

    Directory of Open Access Journals (Sweden)

    Chen Shou-Yi

    2011-01-01

    Full Text Available Abstract Background MicroRNAs (miRNAs regulate gene expression by mediating gene silencing at transcriptional and post-transcriptional levels in higher plants. miRNAs and related target genes have been widely studied in model plants such as Arabidopsis and rice; however, the number of identified miRNAs in soybean (Glycine max is limited, and global identification of the related miRNA targets has not been reported in previous research. Results In our study, a small RNA library and a degradome library were constructed from developing soybean seeds for deep sequencing. We identified 26 new miRNAs in soybean by bioinformatic analysis and further confirmed their expression by stem-loop RT-PCR. The miRNA star sequences of 38 known miRNAs and 8 new miRNAs were also discovered, providing additional evidence for the existence of miRNAs. Through degradome sequencing, 145 and 25 genes were identified as targets of annotated miRNAs and new miRNAs, respectively. GO analysis indicated that many of the identified miRNA targets may function in soybean seed development. Additionally, a soybean homolog of Arabidopsis SUPPRESSOR OF GENE SLIENCING 3 (AtSGS3 was detected as a target of the newly identified miRNA Soy_25, suggesting the presence of feedback control of miRNA biogenesis. Conclusions We have identified large numbers of miRNAs and their related target genes through deep sequencing of a small RNA library and a degradome library. Our study provides more information about the regulatory network of miRNAs in soybean and advances our understanding of miRNA functions during seed development.

  9. Modulation of gene expression made easy

    DEFF Research Database (Denmark)

    Solem, Christian; Jensen, Peter Ruhdal

    2002-01-01

    A new approach for modulating gene expression, based on randomization of promoter (spacer) sequences, was developed. The method was applied to chromosomal genes in Lactococcus lactis and shown to generate libraries of clones with broad ranges of expression levels of target genes. In one example...... that the method can be applied to modulating the expression of native genes on the chromosome. We constructed a series of strains in which the expression of the las operon, containing the genes pfk, pyk, and ldh, was modulated by integrating a truncated copy of the pfk gene. Importantly, the modulation affected...

  10. Quantifying and resolving multiple vector transformants in S. cerevisiae plasmid libraries.

    Science.gov (United States)

    Scanlon, Thomas C; Gray, Elizabeth C; Griswold, Karl E

    2009-11-20

    In addition to providing the molecular machinery for transcription and translation, recombinant microbial expression hosts maintain the critical genotype-phenotype link that is essential for high throughput screening and recovery of proteins encoded by plasmid libraries. It is known that Escherichia coli cells can be simultaneously transformed with multiple unique plasmids and thusly complicate recombinant library screening experiments. As a result of their potential to yield misleading results, bacterial multiple vector transformants have been thoroughly characterized in previous model studies. In contrast to bacterial systems, there is little quantitative information available regarding multiple vector transformants in yeast. Saccharomyces cerevisiae is the most widely used eukaryotic platform for cell surface display, combinatorial protein engineering, and other recombinant library screens. In order to characterize the extent and nature of multiple vector transformants in this important host, plasmid-born gene libraries constructed by yeast homologous recombination were analyzed by DNA sequencing. It was found that up to 90% of clones in yeast homologous recombination libraries may be multiple vector transformants, that on average these clones bear four or more unique mutant genes, and that these multiple vector cells persist as a significant proportion of library populations for greater than 24 hours during liquid outgrowth. Both vector concentration and vector to insert ratio influenced the library proportion of multiple vector transformants, but their population frequency was independent of transformation efficiency. Interestingly, the average number of plasmids born by multiple vector transformants did not vary with their library population proportion. These results highlight the potential for multiple vector transformants to dominate yeast libraries constructed by homologous recombination. The previously unrecognized prevalence and persistence of multiply

  11. Quantifying and resolving multiple vector transformants in S. cerevisiae plasmid libraries

    Directory of Open Access Journals (Sweden)

    Gray Elizabeth C

    2009-11-01

    Full Text Available Abstract Background In addition to providing the molecular machinery for transcription and translation, recombinant microbial expression hosts maintain the critical genotype-phenotype link that is essential for high throughput screening and recovery of proteins encoded by plasmid libraries. It is known that Escherichia coli cells can be simultaneously transformed with multiple unique plasmids and thusly complicate recombinant library screening experiments. As a result of their potential to yield misleading results, bacterial multiple vector transformants have been thoroughly characterized in previous model studies. In contrast to bacterial systems, there is little quantitative information available regarding multiple vector transformants in yeast. Saccharomyces cerevisiae is the most widely used eukaryotic platform for cell surface display, combinatorial protein engineering, and other recombinant library screens. In order to characterize the extent and nature of multiple vector transformants in this important host, plasmid-born gene libraries constructed by yeast homologous recombination were analyzed by DNA sequencing. Results It was found that up to 90% of clones in yeast homologous recombination libraries may be multiple vector transformants, that on average these clones bear four or more unique mutant genes, and that these multiple vector cells persist as a significant proportion of library populations for greater than 24 hours during liquid outgrowth. Both vector concentration and vector to insert ratio influenced the library proportion of multiple vector transformants, but their population frequency was independent of transformation efficiency. Interestingly, the average number of plasmids born by multiple vector transformants did not vary with their library population proportion. Conclusion These results highlight the potential for multiple vector transformants to dominate yeast libraries constructed by homologous recombination. The

  12. Integrated massively parallel sequencing of 15 autosomal STRs and Amelogenin using a simplified library preparation approach.

    Science.gov (United States)

    Xue, Jian; Wu, Riga; Pan, Yajiao; Wang, Shunxia; Qu, Baowang; Qin, Ying; Shi, Yuequn; Zhang, Chuchu; Li, Ran; Zhang, Liyan; Zhou, Cheng; Sun, Hongyu

    2018-04-02

    Massively parallel sequencing (MPS) technologies, also termed as next-generation sequencing (NGS), are becoming increasingly popular in study of short tandem repeats (STR). However, current library preparation methods are usually based on ligation or two-round PCR that requires more steps, making it time-consuming (about 2 days), laborious and expensive. In this study, a 16-plex STR typing system was designed with fusion primer strategy based on the Ion Torrent S5 XL platform which could effectively resolve the above challenges for forensic DNA database-type samples (bloodstains, saliva stains, etc.). The efficiency of this system was tested in 253 Han Chinese participants. The libraries were prepared without DNA isolation and adapter ligation, and the whole process only required approximately 5 h. The proportion of thoroughly genotyped samples in which all the 16 loci were successfully genotyped was 86% (220/256). Of the samples, 99.7% showed 100% concordance between NGS-based STR typing and capillary electrophoresis (CE)-based STR typing. The inconsistency might have been caused by off-ladder alleles and mutations in primer binding sites. Overall, this panel enabled the large-scale genotyping of the DNA samples with controlled quality and quantity because it is a simple, operation-friendly process flow that saves labor, time and costs. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Genome-Wide Tuning of Protein Expression Levels to Rapidly Engineer Microbial Traits.

    Science.gov (United States)

    Freed, Emily F; Winkler, James D; Weiss, Sophie J; Garst, Andrew D; Mutalik, Vivek K; Arkin, Adam P; Knight, Rob; Gill, Ryan T

    2015-11-20

    The reliable engineering of biological systems requires quantitative mapping of predictable and context-independent expression over a broad range of protein expression levels. However, current techniques for modifying expression levels are cumbersome and are not amenable to high-throughput approaches. Here we present major improvements to current techniques through the design and construction of E. coli genome-wide libraries using synthetic DNA cassettes that can tune expression over a ∼10(4) range. The cassettes also contain molecular barcodes that are optimized for next-generation sequencing, enabling rapid and quantitative tracking of alleles that have the highest fitness advantage. We show these libraries can be used to determine which genes and expression levels confer greater fitness to E. coli under different growth conditions.

  14. Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns

    Directory of Open Access Journals (Sweden)

    Hayashizaki Yoshihide

    2009-06-01

    Full Text Available Abstract Background Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals. Results As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection. Conclusion We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the

  15. Human uroporphyrinogen III synthase: Molecular cloning, nucleotide sequence, and expression of a full-length cDNA

    International Nuclear Information System (INIS)

    Tsai, Shihfeng; Bishop, D.F.; Desnick, R.J.

    1988-01-01

    Uroporphyrinogen III synthase, the fourth enzyme in the heme biosynthetic pathway, is responsible for conversion of the linear tetrapyrrole, hydroxymethylbilane, to the cyclic tetrapyrrole, uroporphyrinogen III. The deficient activity of URO-synthase is the enzymatic defect in the autosomal recessive disorder congenital erythropoietic porphyria. To facilitate the isolation of a full-length cDNA for human URO-synthase, the human erythrocyte enzyme was purified to homogeneity and 81 nonoverlapping amino acids were determined by microsequencing the N terminus and four tryptic peptides. Two synthetic oligonucleotide mixtures were used to screen 1.2 x 10 6 recombinants from a human adult liver cDNA library. Eight clones were positive with both oligonucleotide mixtures. Of these, dideoxy sequencing of the 1.3 kilobase insert from clone pUROS-2 revealed 5' and 3' untranslated sequences of 196 and 284 base pairs, respectively, and an open reading frame of 798 base pairs encoding a protein of 265 amino acids with a predicted molecular mass of 28,607 Da. The isolation and expression of this full-length cDNA for human URO-synthase should facilitate studies of the structure, organization, and chromosomal localization of this heme biosynthetic gene as well as the characterization of the molecular lesions causing congenital erythropoietic porphyria

  16. The sequence of spacers between the consensus sequences modulates the strength of procaryotic promoters

    DEFF Research Database (Denmark)

    Jensen, Peter Ruhdal; Hammer, Karin

    1998-01-01

    A library of synthetic promoters for Lactococcus lactis was constructed, in which the known consensus sequences were kept constant while the sequences of the separating spacers were randomized. The library consists of 38 promoters which differ in strength from 0.3 relative units, and up to more t......-reactors and cell factories....

  17. CDNA cloning, characterization and expression of an endosperm-specific barley peroxidase

    DEFF Research Database (Denmark)

    Rasmussen, Søren Kjærsgård; Welinder, K.G.; Hejgaard, J.

    1991-01-01

    A barley peroxidase (BP 1) of pI ca. 8.5 and M(r) 37000 has been purified from mature barley grains. Using antibodies towards peroxidase BP 1, a cDNA clone (pcR7) was isolated from cDNA expression library. The nucleotide sequence of pcR7 gave a derived amino acid sequence identical to the 158 C...

  18. Construction and identification of differential expression genes of peripheral blood cells in radon-exposed mice

    International Nuclear Information System (INIS)

    Chen Rui; Shi Minhua; Hu Huacheng; Li Jianxiang; Nie Jihua; Tong Jian

    2009-01-01

    Objective: To screen and identify the differential expression genes on peripheral blood cells of mice based on the experimental animal model of radon exposure. Methods: BALB/c mice were exposed in a type HD-3 multifunctional radon-room, with the accumulative doses of radon-exposure group at 105 WLM and control group at 1 WLM. Total RNA was extracted from peripheral blood cells and the methods of SMART for dscDNA synthesis and SSH for gene screening was applied. With the construction of the cDNA library enriched with differentially expressed genes, the pMD 18-T plasmid containing LacZ operator at the multiple cloning site was used to allow a blue-white screening. The TA clones were amplified by nested PCR and the reverse Northern blot was used to identify up and down regulation of the clones. The differently expressed cDNA was then sequenced and analyzed. Results: The subtracted cDNA libraries were successfully constructed. A total of 390 recombinant white colonies were randomly selected. Among the 312 cDNA monoclones selected from both forward- and reverse-subtracted libraries, 41 clones were chosen to sequence for their differential expressions based on reverse Northern blot. Among the 41 sequenced clones, 10 clones with known function/annotation and 3 new ESTs with the GenBank accession numbers were obtained. Most of the known function/annotation genes were revealed to be related with cell proliferation, metabolism, cellular apoptosis and carcinogenesis. Conclusions: The animal model of radon exposure was established and the cDNA library of peripheral blood cells was successfully constructed. Radon exposure could up- and down-regulate a series of genes. Differentially expressed genes could be identified by using SSH technique and the results may help exploring mechanisms of random exposure. (authors)

  19. Cardiomyocyte expression and cell-specific processing of procholecystokinin

    DEFF Research Database (Denmark)

    Gøtze, Jens P.; Johnsen, Anders H.; Kistorp, Caroline

    2015-01-01

    has only been suggested using transcriptional measures or methods, with the post-translational phase of gene expression unaddressed. In this study, we examined the cardiac expression of the CCK gene in adult mammals and its expression at the protein level. Using quantitative PCR, a library of sequence......-specific pro-CCK assays, peptide purification, and mass spectrometry, we demonstrate that the mammalian heart expresses pro-CCK in amounts comparable to natriuretic prohormones and processes it to a unique, triple-sulfated, and N-terminally truncated product distinct from intestinal and cerebral CCK peptides...

  20. Library construction and evaluation for site saturation mutagenesis.

    Science.gov (United States)

    Sullivan, Bradford; Walton, Adam Z; Stewart, Jon D

    2013-06-10

    We developed a method for creating and evaluating site-saturation libraries that consistently yields an average of 27.4±3.0 codons of the 32 possible within a pool of 95 transformants. This was verified by sequencing 95 members from 11 independent libraries within the gene encoding alkene reductase OYE 2.6 from Pichia stipitis. Correct PCR primer design as well as a variety of factors that increase transformation efficiency were critical contributors to the method's overall success. We also developed a quantitative analysis of library quality (Q-values) that defines library degeneracy. Q-values can be calculated from standard fluorescence sequencing data (capillary electropherograms) and the degeneracy predicted from an early stage of library construction (pooled plasmids from the initial transformation) closely matched that observed after ca. 1000 library members were sequenced. Based on this experience, we suggest that this analysis can be a useful guide when applying our optimized protocol to new systems, allowing one to focus only on good-quality libraries and reject substandard libraries at an early stage. This advantage is particularly important when lower-throughput screening techniques such as chiral-phase GC must be employed to identify protein variants with desirable properties, e.g., altered stereoselectivities or when multiple codons are targeted for simultaneous randomization. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Population structure of pigs determined by single nucleotide polymorphisms observed in assembled expressed sequence tags.

    Science.gov (United States)

    Matsumoto, Toshimi; Okumura, Naohiko; Uenishi, Hirohide; Hayashi, Takeshi; Hamasima, Noriyuki; Awata, Takashi

    2012-01-01

    We have collected more than 190000 porcine expressed sequence tags (ESTs) from full-length complementary DNA (cDNA) libraries and identified more than 2800 single nucleotide polymorphisms (SNPs). In this study, we tentatively chose 222 SNPs observed in assembled ESTs to study pigs of different breeds; 104 were selected by comparing the cDNA sequences of a Meishan pig and samples of three-way cross pigs (Landrace, Large White, and Duroc: LWD), and 118 were selected from LWD samples. To evaluate the genetic variation between the chosen SNPs from pig breeds, we determined the genotypes for 192 pig samples (11 pig groups) from our DNA reference panel with matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Of the 222 reference SNPs, 186 were successfully genotyped. A neighbor-joining tree showed that the pig groups were classified into two large clusters, namely, Euro-American and East Asian pig populations. F-statistics and the analysis of molecular variance of Euro-American pig groups revealed that approximately 25% of the genetic variations occurred because of intergroup differences. As the F(IS) values were less than the F(ST) values(,) the clustering, based on the Bayesian inference, implied that there was strong genetic differentiation among pig groups and less divergence within the groups in our samples. © 2011 The Authors. Animal Science Journal © 2011 Japanese Society of Animal Science.

  2. Parallel metatranscriptome analyses of host and symbiont gene expression in the gut of the termite Reticulitermes flavipes

    Directory of Open Access Journals (Sweden)

    Zhou Xuguo

    2009-10-01

    Full Text Available Abstract Background Termite lignocellulose digestion is achieved through a collaboration of host plus prokaryotic and eukaryotic symbionts. In the present work, we took a combined host and symbiont metatranscriptomic approach for investigating the digestive contributions of host and symbiont in the lower termite Reticulitermes flavipes. Our approach consisted of parallel high-throughput sequencing from (i a host gut cDNA library and (ii a hindgut symbiont cDNA library. Subsequently, we undertook functional analyses of newly identified phenoloxidases with potential importance as pretreatment enzymes in industrial lignocellulose processing. Results Over 10,000 expressed sequence tags (ESTs were sequenced from the 2 libraries that aligned into 6,555 putative transcripts, including 171 putative lignocellulase genes. Sequence analyses provided insights in two areas. First, a non-overlapping complement of host and symbiont (prokaryotic plus protist glycohydrolase gene families known to participate in cellulose, hemicellulose, alpha carbohydrate, and chitin degradation were identified. Of these, cellulases are contributed by host plus symbiont genomes, whereas hemicellulases are contributed exclusively by symbiont genomes. Second, a diverse complement of previously unknown genes that encode proteins with homology to lignase, antioxidant, and detoxification enzymes were identified exclusively from the host library (laccase, catalase, peroxidase, superoxide dismutase, carboxylesterase, cytochrome P450. Subsequently, functional analyses of phenoloxidase activity provided results that were strongly consistent with patterns of laccase gene expression. In particular, phenoloxidase activity and laccase gene expression are mostly restricted to symbiont-free foregut plus salivary gland tissues, and phenoloxidase activity is inducible by lignin feeding. Conclusion To our knowledge, this is the first time that a dual host-symbiont transcriptome sequencing effort

  3. Capturing the 'ome': the expanding molecular toolbox for RNA and DNA library construction.

    Science.gov (United States)

    Boone, Morgane; De Koker, Andries; Callewaert, Nico

    2018-04-06

    All sequencing experiments and most functional genomics screens rely on the generation of libraries to comprehensively capture pools of targeted sequences. In the past decade especially, driven by the progress in the field of massively parallel sequencing, numerous studies have comprehensively assessed the impact of particular manipulations on library complexity and quality, and characterized the activities and specificities of several key enzymes used in library construction. Fortunately, careful protocol design and reagent choice can substantially mitigate many of these biases, and enable reliable representation of sequences in libraries. This review aims to guide the reader through the vast expanse of literature on the subject to promote informed library generation, independent of the application.

  4. Construction of 12 EST libraries and characterization of a 12,226 EST dataset for chicory (Cichorium intybus root, leaves and nodules in the context of carbohydrate metabolism investigation

    Directory of Open Access Journals (Sweden)

    Boutry Marc

    2009-01-01

    Full Text Available Abstract Background The industrial chicory, Cichorium intybus, is a member of the Asteraceae family that accumulates fructan of the inulin type in its root. Inulin is a low calories sweetener, a texture agent and a health promoting ingredient due to its prebiotic properties. Average inulin chain length is a critical parameter that is genotype and temperature dependent. In the context of the study of carbohydrate metabolism and to get insight into the transcriptome of chicory root and to visualize temporal changes of gene expression during the growing season, we obtained and characterized 10 cDNA libraries from chicory roots regularly sampled in field during a growing season. A leaf and a nodule libraries were also obtained for comparison. Results Approximately 1,000 Expressed Sequence Tags (EST were obtained from each of twelve cDNA libraries resulting in a 12,226 EST dataset. Clustering of these ESTs returned 1,922 contigs and 4,869 singlets for a total of 6,791 putative unigenes. All ESTs were compared to public sequence databases and functionally classified. Data were specifically searched for sequences related to carbohydrate metabolism. Season wide evolution of functional classes was evaluated by comparing libraries at the level of functional categories and unigenes distribution. Conclusion This chicory EST dataset provides a season wide outlook of the genes expressed in the root and to a minor extent in leaves and nodules. The dataset contains more than 200 sequences related to carbohydrate metabolism and 3,500 new ESTs when compared to other recently released chicory EST datasets, probably because of the season wide coverage of the root samples. We believe that these sequences will contribute to accelerate research and breeding of the industrial chicory as well as of closely related species.

  5. Comparative gene expression in toxic versus non-toxic strains of the marine dinoflagellate Alexandrium minutum

    Directory of Open Access Journals (Sweden)

    Glöckner Gernot

    2010-04-01

    Full Text Available Abstract Background The dinoflagellate Alexandrium minutum typically produces paralytic shellfish poisoning (PSP toxins, which are known only from cyanobacteria and dinoflagellates. While a PSP toxin gene cluster has recently been characterized in cyanobacteria, the genetic background of PSP toxin production in dinoflagellates remains elusive. Results We constructed and analysed an expressed sequence tag (EST library of A. minutum, which contained 15,703 read sequences yielding a total of 4,320 unique expressed clusters. Of these clusters, 72% combined the forward-and reverse reads of at least one bacterial clone. This sequence resource was then used to construct an oligonucleotide microarray. We analysed the expression of all clusters in three different strains. While the cyanobacterial PSP toxin genes were not found among the A. minutum sequences, 192 genes were differentially expressed between toxic and non-toxic strains. Conclusions Based on this study and on the lack of identified PSP synthesis genes in the two existent Alexandrium tamarense EST libraries, we propose that the PSP toxin genes in dinoflagellates might be more different from their cyanobacterial counterparts than would be expected in the case of a recent gene transfer. As a starting point to identify possible PSP toxin-associated genes in dinoflagellates without relying on a priori sequence information, the sequences only present in mRNA pools of the toxic strain can be seen as putative candidates involved in toxin synthesis and regulation, or acclimation to intracellular PSP toxins.

  6. Purpose-Oriented Antibody Libraries Incorporating Tailored CDR3 Sequences

    OpenAIRE

    Bonvin, Pauline; Venet, Sophie; Kosco-Vilbois, Marie; Fischer, Nicolas

    2015-01-01

    The development of in vitro antibody selection technologies has allowed overcoming some limitations inherent to the hybridoma technology. In most cases, large repertoires of antibody genes have been assembled to create highly diversified libraries allowing the isolation of antibodies recognizing virtually any antigen. However, these universal libraries might not allow the isolation of antibodies with specific structural properties or particular amino acid contents that are rarely found in nat...

  7. Analysis of a diverse assemblage of diazotrophic bacteria from Spartina alterniflora using DGGE and clone library screening.

    Science.gov (United States)

    Lovell, Charles R; Decker, Peter V; Bagwell, Christopher E; Thompson, Shelly; Matsui, George Y

    2008-05-01

    Methods to assess the diversity of the diazotroph assemblage in the rhizosphere of the salt marsh cordgrass, Spartina alterniflora were examined. The effectiveness of nifH PCR-denaturing gradient gel electrophoresis (DGGE) was compared to that of nifH clone library analysis. Seventeen DGGE gel bands were sequenced and yielded 58 nonidentical nifH sequences from a total of 67 sequences determined. A clone library constructed using the GC-clamp nifH primers that were employed in the PCR-DGGE (designated the GC-Library) yielded 83 nonidentical sequences from a total of 257 nifH sequences. A second library constructed using an alternate set of nifH primers (N-Library) yielded 83 nonidentical sequences from a total of 138 nifH sequences. Rarefaction curves for the libraries did not reach saturation, although the GC-Library curve was substantially dampened and appeared to be closer to saturation than the N-Library curve. Phylogenetic analyses showed that DGGE gel band sequencing recovered nifH sequences that were frequently sampled in the GC-Library, as well as sequences that were infrequently sampled, and provided a species composition assessment that was robust, efficient, and relatively inexpensive to obtain. Further, the DGGE method permits a large number of samples to be examined for differences in banding patterns, after which bands of interest can be sampled for sequence determination.

  8. Expression capable library for studies of Neisseria gonorrhoeae, version 1.0

    Directory of Open Access Journals (Sweden)

    Wachocki Susi

    2005-09-01

    Full Text Available Abstract Background The sexually transmitted disease, gonorrhea, is a serious health problem in developed as well as in developing countries, for which treatment continues to be a challenge. The recent completion of the genome sequence of the causative agent, Neisseria gonorrhoeae, opens up an entirely new set of approaches for studying this organism and the diseases it causes. Here, we describe the initial phases of the construction of an expression-capable clone set representing the protein-coding ORFs of the gonococcal genome using a recombination-based cloning system. Results The clone set thus far includes 1672 of the 2250 predicted ORFs of the N. gonorrhoeae genome, of which 1393 (83% are sequence-validated. Included in this set are 48 of the 61 ORFs of the gonococcal genetic island of strain MS11, not present in the sequenced genome of strain FA1090. L-arabinose-inducible glutathione-S-transferase (GST-fusions were constructed from random clones and each was shown to express a fusion protein of the predicted size following induction, demonstrating the use of the recombination cloning system. PCR amplicons of each ORF used in the cloning reactions were spotted onto glass slides to produce DNA microarrays representing 2035 genes of the gonococcal genome. Pilot experiments indicate that these arrays are suitable for the analysis of global gene expression in gonococci. Conclusion This archived set of Gateway® entry clones will facilitate high-throughput genomic and proteomic studies of gonococcal genes using a variety of expression and analysis systems. In addition, the DNA arrays produced will allow us to generate gene expression profiles of gonococci grown in a wide variety of conditions. Together, the resources produced in this work will facilitate experiments to dissect the molecular mechanisms of gonococcal pathogenesis on a global scale, and ultimately lead to the determination of the functions of unknown genes in the genome.

  9. Molecular cloning and expression of bovine kappa-casein in Escherichia coli

    International Nuclear Information System (INIS)

    Kang, Y.C.; Richardson, T.

    1988-01-01

    A cDNA library was constructed using poly(A) + RNA from bovine mammary gland. This cDNA library of 6000 clones was screened employing colony hybridization using 32 P-labelled oligonucleotide probes and restriction endonuclease mapping. The cDNA from the selected plasmid, pKR76, was sequenced using the dideoxy-chain termination method. The cDNA insert of pKR76 carries the full-length sequence, which codes for mature kappa-casein protein. The amino acid sequence deduced from the cDNA sequence fits the published amino acid sequence with three exceptions; the reported pyroglutamic acid at position 1, tyrosine at position 35, and aspartic acid at position 81 are, respectively, a glutamine, a histidine, and an asparagine in the clone containing pKR76. The MspI-, NlaIV-cleaved fragment (630 base pair) from the kappa-casein cDNA insert has been subcloned into expression vectors pUC18 and pKK233-2, which contain a lac promoter and a trc promoter, respectively. Escherichia coli cells carrying the recombinant expression plasmids were shown to produce kappa-casein protein having the expected mobility on sodium dodecyl sulfate-polyacrylamide gel electrophoresis and being recognized by specific antibodies raised against natural bovine kappa-casein

  10. Comparison of next generation sequencing technologies for transcriptome characterization

    Directory of Open Access Journals (Sweden)

    Soltis Douglas E

    2009-08-01

    Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance

  11. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  12. Differential screening of phage-ab libraries by oligonucleotide microarray technology.

    Directory of Open Access Journals (Sweden)

    Paolo Monaci

    Full Text Available A novel and efficient tagArray technology was developed that allows rapid identification of antibodies which bind to receptors with a specific expression profile, in the absence of biological information. This method is based on the cloning of a specific, short nucleotide sequence (tag in the phagemid coding for each phage-displayed antibody fragment (phage-Ab present in a library. In order to set up and validate the method we identified about 10,000 different phage-Abs binding to receptors expressed in their native form on the cell surface (10 k Membranome collection and tagged each individual phage-Ab. The frequency of each phage-Ab in a given population can at this point be inferred by measuring the frequency of its associated tag sequence through standard DNA hybridization methods. Using tiny amounts of biological samples we identified phage-Abs binding to receptors preferentially expressed on primary tumor cells rather than on cells obtained from matched normal tissues. These antibodies inhibited cell proliferation in vitro and tumor development in vivo, thus representing therapeutic lead candidates.

  13. Strong spurious transcription likely contributes to DNA insert bias in typical metagenomic clone libraries.

    Science.gov (United States)

    Lam, Kathy N; Charles, Trevor C

    2015-01-01

    Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role. To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ(70) promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ(70) consensus sequences in the genome than with simple GC content. The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ(70) consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite

  14. Multiplexed microsatellite recovery using massively parallel sequencing

    Science.gov (United States)

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  15. Identification of differentially expressed genes in cucumber (Cucumis sativus L.) root under waterlogging stress by digital gene expression profile.

    Science.gov (United States)

    Qi, Xiao-Hua; Xu, Xue-Wen; Lin, Xiao-Jian; Zhang, Wen-Jie; Chen, Xue-Hao

    2012-03-01

    High-throughput tag-sequencing (Tag-seq) analysis based on the Solexa Genome Analyzer platform was applied to analyze the gene expression profiling of cucumber plant at 5 time points over a 24h period of waterlogging treatment. Approximately 5.8 million total clean sequence tags per library were obtained with 143013 distinct clean tag sequences. Approximately 23.69%-29.61% of the distinct clean tags were mapped unambiguously to the unigene database, and 53.78%-60.66% of the distinct clean tags were mapped to the cucumber genome database. Analysis of the differentially expressed genes revealed that most of the genes were down-regulated in the waterlogging stages, and the differentially expressed genes mainly linked to carbon metabolism, photosynthesis, reactive oxygen species generation/scavenging, and hormone synthesis/signaling. Finally, quantitative real-time polymerase chain reaction using nine genes independently verified the tag-mapped results. This present study reveals the comprehensive mechanisms of waterlogging-responsive transcription in cucumber. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. Deep sequencing of small RNA libraries from human prostate epithelial and stromal cells reveal distinct pattern of microRNAs primarily predicted to target growth factors.

    Science.gov (United States)

    Singh, Savita; Zheng, Yun; Jagadeeswaran, Guru; Ebron, Jey Sabith; Sikand, Kavleen; Gupta, Sanjay; Sunker, Ramanjulu; Shukla, Girish C

    2016-02-28

    Complex epithelial and stromal cell interactions are required during the development and progression of prostate cancer. Regulatory small non-coding microRNAs (miRNAs) participate in the spatiotemporal regulation of messenger RNA (mRNA) and regulation of translation affecting a large number of genes involved in prostate carcinogenesis. In this study, through deep-sequencing of size fractionated small RNA libraries we profiled the miRNAs of prostate epithelial (PrEC) and stromal (PrSC) cells. Over 50 million reads were obtained for PrEC in which 860,468 were unique sequences. Similarly, nearly 76 million reads for PrSC were obtained in which over 1 million were unique reads. Expression of many miRNAs of broadly conserved and poorly conserved miRNA families were identified. Sixteen highly expressed miRNAs with significant change in expression in PrSC than PrEC were further analyzed in silico. ConsensusPathDB showed the target genes of these miRNAs were significantly involved in adherence junction, cell adhesion, EGRF, TGF-β and androgen signaling. Let-7 family of tumor-suppressor miRNAs expression was highly pervasive in both, PrEC and PrSC cells. In addition, we have also identified several miRNAs that are unique to PrEC or PrSC cells and their predicted putative targets are a group of transcription factors. This study provides perspective on the miRNA expression in PrEC and PrSC, and reveals a global trend in miRNA interactome. We conclude that the most abundant miRNAs are potential regulators of development and differentiation of the prostate gland by targeting a set of growth factors. Additionally, high level expression of the most members of let-7 family miRNAs suggests their role in the fine tuning of the growth and proliferation of prostate epithelial and stromal cells. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  17. Analysis and functional annotation of expressed sequence tags from in vitro cell lines of elasmobranchs: Spiny dogfish shark (Squalus acanthias) and little skate (Leucoraja erinacea).

    Science.gov (United States)

    Parton, Angela; Bayne, Christopher J; Barnes, David W

    2010-09-01

    Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories "envelope" and "oxidoreductase activity" but the SAE transcripts did not. GO analysis of SAE transcripts identified the category "anatomical structure formation" that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes. Copyright 2010 Elsevier Inc. All rights reserved.

  18. Comparison of techniques for quantification of next-generation sequencing libraries

    DEFF Research Database (Denmark)

    Hussing, Christian; Kampmann, Marie-Louise; Mogensen, Helle Smidt

    2015-01-01

    by quantifying NGS libraries for the Ion TorrentTM and Illumina1 platforms as well as dsDNA oligos with known DNA concentrations. Rather large variations in library concentration estimates were observed. The differences between the highest and lowest concentration estimates varied with a factor of 5...

  19. Whole transcriptome analysis of Acinetobacter baumannii assessed by RNA-sequencing reveals different mRNA expression profiles in biofilm compared to planktonic cells.

    Directory of Open Access Journals (Sweden)

    Soraya Rumbo-Feal

    Full Text Available Acinetobacterbaumannii has emerged as a dangerous opportunistic pathogen, with many strains able to form biofilms and thus cause persistent infections. The aim of the present study was to use high-throughput sequencing techniques to establish complete transcriptome profiles of planktonic (free-living and sessile (biofilm forms of A. baumannii ATCC 17978 and thereby identify differences in their gene expression patterns. Collections of mRNA from planktonic (both exponential and stationary phase cultures and sessile (biofilm cells were sequenced. Six mRNA libraries were prepared following the mRNA-Seq protocols from Illumina. Reads were obtained in a HiScanSQ platform and mapped against the complete genome to describe the complete mRNA transcriptomes of planktonic and sessile cells. The results showed that the gene expression pattern of A. baumannii biofilm cells was distinct from that of planktonic cells, including 1621 genes over-expressed in biofilms relative to stationary phase cells and 55 genes expressed only in biofilms. These differences suggested important changes in amino acid and fatty acid metabolism, motility, active transport, DNA-methylation, iron acquisition, transcriptional regulation, and quorum sensing, among other processes. Disruption or deletion of five of these genes caused a significant decrease in biofilm formation ability in the corresponding mutant strains. Among the genes over-expressed in biofilm cells were those in an operon involved in quorum sensing. One of them, encoding an acyl carrier protein, was shown to be involved in biofilm formation as demonstrated by the significant decrease in biofilm formation by the corresponding knockout strain. The present work serves as a basis for future studies examining the complex network systems that regulate bacterial biofilm formation and maintenance.

  20. Gene-expression analysis of cold-stress response in the sexually transmitted protist Trichomonas vaginalis.

    Science.gov (United States)

    Fang, Yi-Kai; Huang, Kuo-Yang; Huang, Po-Jung; Lin, Rose; Chao, Mei; Tang, Petrus

    2015-12-01

    Trichomonas vaginalis is the etiologic agent of trichomoniasis, the most common nonviral sexually transmitted disease in the world. This infection affects millions of individuals worldwide annually. Although direct sexual contact is the most common mode of transmission, increasing evidence indicates that T. vaginalis can survive in the external environment and can be transmitted by contaminated utensils. We found that the growth of T. vaginalis under cold conditions is greatly inhibited, but recovers after placing these stressed cells at the normal cultivation temperature of 37 °C. However, the mechanisms by which T. vaginalis regulates this adaptive process are unclear. An expressed sequence tag (EST) database generated from a complementary DNA library of T. vaginalis messenger RNAs expressed under cold-culture conditions (4 °C, TvC) was compared with a previously published normal-cultured EST library (37 °C, TvE) to assess the cold-stress responses of T. vaginalis. A total of 9780 clones were sequenced from the TvC library and were mapped to 2934 genes in the T. vaginalis genome. A total of 1254 genes were expressed in both the TvE and TvC libraries, and 1680 genes were only found in the TvC library. A functional analysis showed that cold temperature has effects on many cellular mechanisms, including increased H2O2 tolerance, activation of the ubiquitin-proteasome system, induction of iron-sulfur cluster assembly, and reduced energy metabolism and enzyme expression. The current study is the first large-scale transcriptomic analysis in cold-stressed T. vaginalis and the results enhance our understanding of this important protist. Copyright © 2014. Published by Elsevier B.V.

  1. Identification of miRNAs and their targets through high-throughput sequencing and degradome analysis in male and female Asparagus officinalis.

    Science.gov (United States)

    Chen, Jingli; Zheng, Yi; Qin, Li; Wang, Yan; Chen, Lifei; He, Yanjun; Fei, Zhangjun; Lu, Gang

    2016-04-12

    MicroRNAs (miRNAs), a class of non-coding small RNAs (sRNAs), regulate various biological processes. Although miRNAs have been identified and characterized in several plant species, miRNAs in Asparagus officinalis have not been reported. As a dioecious plant with homomorphic sex chromosomes, asparagus is regarded as an important model system for studying mechanisms of plant sex determination. Two independent sRNA libraries from male and female asparagus plants were sequenced with Illumina sequencing, thereby generating 4.13 and 5.88 million final clean reads, respectively. Both libraries predominantly contained 24-nt sRNAs, followed by 21-nt sRNAs. Further analysis identified 154 conserved miRNAs, which belong to 26 families, and 39 novel miRNA candidates seemed to be specific to asparagus. Comparative profiling revealed that 63 miRNAs exhibited significant differential expression between male and female plants, which was confirmed by real-time quantitative PCR analysis. Among them, 37 miRNAs were significantly up-regulated in the female library, whereas the others were preferentially expressed in the male library. Furthermore, 40 target mRNAs representing 44 conserved and seven novel miRNAs were identified in asparagus through high-throughput degradome sequencing. Functional annotation showed that these target mRNAs were involved in a wide range of developmental and metabolic processes. We identified a large set of conserved and specific miRNAs and compared their expression levels between male and female asparagus plants. Several asparagus miRNAs, which belong to the miR159, miR167, and miR172 families involved in reproductive organ development, were differentially expressed between male and female plants, as well as during flower development. Consistently, several predicted targets of asparagus miRNAs were associated with floral organ development. These findings suggest the potential roles of miRNAs in sex determination and reproductive developmental processes in

  2. Using PATIMDB to create bacterial transposon insertion mutant libraries.

    Science.gov (United States)

    Urbach, Jonathan M; Wei, Tao; Liberati, Nicole; Grenfell-Lee, Daniel; Villanueva, Jacinto; Wu, Gang; Ausubel, Frederick M

    2009-04-01

    PATIMDB is a software package for facilitating the generation of transposon mutant insertion libraries. The software has two main functions: process tracking and automated sequence analysis. The process tracking function specifically includes recording the status and fates of multiwell plates and samples in various stages of library construction. Automated sequence analysis refers specifically to the pipeline of sequence analysis starting with ABI files from a sequencing facility and ending with insertion location identifications. The protocols in this unit describe installation and use of PATIMDB software.

  3. Bioproduction and characterization of extracellular melanin-like pigment from industrially polluted metagenomic library equipped Escherichia coli.

    Science.gov (United States)

    Amin, Shivani; Rastogi, Rajesh P; Sonani, Ravi R; Ray, Arabinda; Sharma, Rakesh; Madamwar, Datta

    2018-04-15

    To explore the potential genes from the industrially polluted Amlakhadi canal, located in Ankleshwar, Gujarat, India, its community genome was extracted and cloned into E. coli EPI300™-T1 R using a fosmid vector (pCC2 FOS™) generating a library of 3,92,000 clones with average size of 40kb of DNA-insert. From this library, the clone DM1 producing brown colored melanin-like pigment was isolated and characterized. For over expression of the pigment, further sub-cloning of the clone DM1 was done. Sub-clone containing 10kb of the insert was sequenced for gene identification. The amino acids sequence of a protein 4-Hydroxyphenylpyruvate dioxygenase (HPPD), which is know to be involved in melanin biosynthesis was obtained from the gene sequence. The sequence-homology based 3D structure model of HPPD was constructed and analyzed. The physico-chemical nature of pigment was further analysed using 1 H and 13 C NMR, LC-MS, FTIR and UV-visible spectroscopy. The pigment was readily soluble in DMSO with an absorption maximum around 290nm. Based on the genetic and chemical characterization, the compound was confirmed as melanin-like pigment. The present results indicate that the metagenomic library from industrially polluted environment generated a microbial tool for the production of melanin-like pigment. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. Alternative splicing enriched cDNA libraries identify breast cancer-associated transcripts

    Science.gov (United States)

    2010-01-01

    Background Alternative splicing (AS) is a central mechanism in the generation of genomic complexity and is a major contributor to transcriptome and proteome diversity. Alterations of the splicing process can lead to deregulation of crucial cellular processes and have been associated with a large spectrum of human diseases. Cancer-associated transcripts are potential molecular markers and may contribute to the development of more accurate diagnostic and prognostic methods and also serve as therapeutic targets. Alternative splicing-enriched cDNA libraries have been used to explore the variability generated by alternative splicing. In this study, by combining the use of trapping heteroduplexes and RNA amplification, we developed a powerful approach that enables transcriptome-wide exploration of the AS repertoire for identifying AS variants associated with breast tumor cells modulated by ERBB2 (HER-2/neu) oncogene expression. Results The human breast cell line (C5.2) and a pool of 5 ERBB2 over-expressing breast tumor samples were used independently for the construction of two AS-enriched libraries. In total, 2,048 partial cDNA sequences were obtained, revealing 214 alternative splicing sequence-enriched tags (ASSETs). A subset with 79 multiple exon ASSETs was compared to public databases and reported 138 different AS events. A high success rate of RT-PCR validation (94.5%) was obtained, and 2 novel AS events were identified. The influence of ERBB2-mediated expression on AS regulation was evaluated by capillary electrophoresis and probe-ligation approaches in two mammary cell lines (Hb4a and C5.2) expressing different levels of ERBB2. The relative expression balance between AS variants from 3 genes was differentially modulated by ERBB2 in this model system. Conclusions In this study, we presented a method for exploring AS from any RNA source in a transcriptome-wide format, which can be directly easily adapted to next generation sequencers. We identified AS transcripts

  5. Expressed sequence tags of differential genes in the radioresistant mice and their parental mice

    International Nuclear Information System (INIS)

    Wang Qin; Yue Jingyin; Li Jin; Song Li; Liu Qiang; Mu Chuanjie; Wu Hongying

    2009-01-01

    Objective: To explore radioresistance correlative genes in IRM-2 inbred mouse. Methods: The total RNA was extracted from spleen cells of IRM-2 and their parent 615 and ICR/JCL mouse. The mRNA differential display technique was used to analyze gene expression differences. Each differential bands were amplified by PCR, cloned and sequenced. Results: There were 75 differential expression bands appearing in IRM-2 mouse but not in 615 and ICR/JCL mouse. Fifty-two pieces of cDNA sequences were got by sequencing. Twenty-one expressed sequence tags (EST) that were not the same as known mice genes were found and registered by comparing with GenBank database. Conclusion: Twenty-one EST denote that radioresistance correlative genes may be in IRM-2 mouse, which have laid a foundation for isolating and identifying radioresistance correlative genes in further study. (authors)

  6. A compact, in vivo screen of all 6-mers reveals drivers of tissue-specific expression and guides synthetic regulatory element design.

    Science.gov (United States)

    Smith, Robin P; Riesenfeld, Samantha J; Holloway, Alisha K; Li, Qiang; Murphy, Karl K; Feliciano, Natalie M; Orecchia, Lorenzo; Oksenberg, Nir; Pollard, Katherine S; Ahituv, Nadav

    2013-07-18

    Large-scale annotation efforts have improved our ability to coarsely predict regulatory elements throughout vertebrate genomes. However, it is unclear how complex spatiotemporal patterns of gene expression driven by these elements emerge from the activity of short, transcription factor binding sequences. We describe a comprehensive promoter extension assay in which the regulatory potential of all 6 base-pair (bp) sequences was tested in the context of a minimal promoter. To enable this large-scale screen, we developed algorithms that use a reverse-complement aware decomposition of the de Bruijn graph to design a library of DNA oligomers incorporating every 6-bp sequence exactly once. Our library multiplexes all 4,096 unique 6-mers into 184 double-stranded 15-bp oligomers, which is sufficiently compact for in vivo testing. We injected each multiplexed construct into zebrafish embryos and scored GFP expression in 15 tissues at two developmental time points. Twenty-seven constructs produced consistent expression patterns, with the majority doing so in only one tissue. Functional sequences are enriched near biologically relevant genes, match motifs for developmental transcription factors, and are required for enhancer activity. By concatenating tissue-specific functional sequences, we generated completely synthetic enhancers for the notochord, epidermis, spinal cord, forebrain and otic lateral line, and show that short regulatory sequences do not always function modularly. This work introduces a unique in vivo catalog of short, functional regulatory sequences and demonstrates several important principles of regulatory element organization. Furthermore, we provide resources for designing compact, reverse-complement aware k-mer libraries.

  7. Characterization and comparative analysis of small RNAs in three small RNA libraries of the brown planthopper (Nilaparvata lugens).

    Science.gov (United States)

    Chen, Qiuhong; Lu, Lin; Hua, Hongxia; Zhou, Fei; Lu, Liaoxun; Lin, Yongjun

    2012-01-01

    The brown planthopper (BPH), Nilaparvata lugens (Stå;l), which belongs to Homopteran, Delphacidae, is one of the most serious and destructive pests of rice. Feeding BPH with homologous dsRNA in vitro can lead to the death of BPH, which gives a valuable clue to the prevention and control of this pest, however, we know little about its small RNA world. Small RNA libraries for three developmental stages of BPH (CX-male adult, CC-female adult, CY-last instar female nymph) had been constructed and sequenced. It revealed a prolific small RNA world of BPH. We obtained a final list of 452 (CX), 430 (CC), and 381 (CY) conserved microRNAs (miRNAs), respectively, as well as a total of 71 new miRNAs in the three libraries. All the miRNAs had their own expression profiles in the three libraries. The phylogenic evolution of the miRNA families in BPH was consistent with other species. The new miRNA sequences demonstrated some base biases. Our study discovered a large number of small RNAs through deep sequencing of three small RNA libraries of BPH. Many animal-conserved miRNA families as well as some novel miRNAs have been detected in our libraries. This is the first achievement to discover the small RNA world of BPH. A lot of new valuable information about BPH small RNAs has been revealed which was helpful for studying insect molecular biology and insect resistant research.

  8. Computational analysis of the human HSPH/HSPA/DNAJ family and cloning of a human HSPH/HSPA/DNAJ expression library

    NARCIS (Netherlands)

    Hageman, Jurre; Kampinga, Harm H.

    In this manuscript, we describe the generation of a gene library for the expression of HSP110/HSPH, HSP70/HSPA and HSP40/DNAJ members. First, the heat shock protein (HSP) genes were collected from the gene databases and the gene families were analyzed for expression patterns, heat inducibility,

  9. A highly redundant BAC library of Atlantic salmon (Salmo salar: an important tool for salmon projects

    Directory of Open Access Journals (Sweden)

    Koop Ben F

    2005-04-01

    Full Text Available Abstract Background As farming of Atlantic salmon is growing as an aquaculture enterprise, the need to identify the genomic mechanisms for specific traits is becoming more important in breeding and management of the animal. Traits of importance might be related to growth, disease resistance, food conversion efficiency, color or taste. To identify genomic regions responsible for specific traits, genomic large insert libraries have previously proven to be of crucial importance. These large insert libraries can be screened using gene or genetic markers in order to identify and map regions of interest. Furthermore, large-scale mapping can utilize highly redundant libraries in genome projects, and hence provide valuable data on the genome structure. Results Here we report the construction and characterization of a highly redundant bacterial artificial chromosome (BAC library constructed from a Norwegian aquaculture strain male of Atlantic salmon (Salmo salar. The library consists of a total number of 305 557 clones, in which approximately 299 000 are recombinants. The average insert size of the library is 188 kbp, representing 18-fold genome coverage. High-density filters each consisting of 18 432 clones spotted in duplicates have been produced for hybridization screening, and are publicly available 1. To characterize the library, 15 expressed sequence tags (ESTs derived overgos and 12 oligo sequences derived from microsatellite markers were used in hybridization screening of the complete BAC library. Secondary hybridizations with individual probes were performed for the clones detected. The BACs positive for the EST probes were fingerprinted and mapped into contigs, yielding an average of 3 contigs for each probe. Clones identified using genomic probes were PCR verified using microsatellite specific primers. Conclusion Identification of genes and genomic regions of interest is greatly aided by the availability of the CHORI-214 Atlantic salmon BAC

  10. Digital PCR provides sensitive and absolute calibration for high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Fan H Christina

    2009-03-01

    Full Text Available Abstract Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

  11. Detecting atypical examples of known domain types by sequence similarity searching: the SBASE domain library approach.

    Science.gov (United States)

    Dhir, Somdutta; Pacurar, Mircea; Franklin, Dino; Gáspári, Zoltán; Kertész-Farkas, Attila; Kocsor, András; Eisenhaber, Frank; Pongor, Sándor

    2010-11-01

    SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.

  12. [cDNA library construction from panicle meristem of finger millet].

    Science.gov (United States)

    Radchuk, V; Pirko, Ia V; Isaenkov, S V; Emets, A I; Blium, Ia B

    2014-01-01

    The protocol for production of full-size cDNA using SuperScript Full-Length cDNA Library Construction Kit II (Invitrogen) was tested and high quality cDNA library from meristematic tissue of finger millet panicle (Eleusine coracana (L.) Gaertn) was created. The titer of obtained cDNA library comprised 3.01 x 10(5) CFU/ml in avarage. In average the length of cDNA insertion consisted about 1070 base pairs, the effectivity of cDNA fragment insertions--99.5%. The selective sequencing of cDNA clones from created library was performed. The sequences of cDNA clones were identified with usage of BLAST-search. The results of cDNA library analysis and selective sequencing represents prove good functionality and full length character of inserted cDNA clones. Obtained cDNA library from meristematic tissue of finger millet panicle represents good and valuable source for isolation and identification of key genes regulating metabolism and meristematic development and for mining of new molecular markers to conduct out high quality genetic investigations and molecular breeding as well.

  13. The ura5 gene of the ascomycete Sordaria macrospora: molecular cloning, characterization and expression in Escherichia coli.

    Science.gov (United States)

    Le Chevanton, L; Leblon, G

    1989-04-15

    We cloned the ura5 gene coding for the orotate phosphoribosyl transferase from the ascomycete Sordaria macrospora by heterologous probing of a Sordaria genomic DNA library with the corresponding Podospora anserina sequence. The Sordaria gene was expressed in an Escherichia coli pyrE mutant strain defective for the same enzyme, and expression was shown to be promoted by plasmid sequences. The nucleotide sequence of the 1246-bp DNA fragment encompassing the region of homology with the Podospora gene has been determined. This sequence contains an open reading frame of 699 nucleotides. The deduced amino acid sequence shows 72% similarity with the corresponding Podospora protein.

  14. Analyses of expressed sequence tags from the maize foliar pathogen Cercospora zeae-maydis identify novel genes expressed during vegetative, infectious, and reproductive growth

    Directory of Open Access Journals (Sweden)

    Kema Gert HJ

    2008-11-01

    Full Text Available Abstract Background The ascomycete fungus Cercospora zeae-maydis is an aggressive foliar pathogen of maize that causes substantial losses annually throughout the Western Hemisphere. Despite its impact on maize production, little is known about the regulation of pathogenesis in C. zeae-maydis at the molecular level. The objectives of this study were to generate a collection of expressed sequence tags (ESTs from C. zeae-maydis and evaluate their expression during vegetative, infectious, and reproductive growth. Results A total of 27,551 ESTs was obtained from five cDNA libraries constructed from vegetative and sporulating cultures of C. zeae-maydis. The ESTs, grouped into 4088 clusters and 531 singlets, represented 4619 putative unique genes. Of these, 36% encoded proteins similar (E value ≤ 10-05 to characterized or annotated proteins from the NCBI non-redundant database representing diverse molecular functions and biological processes based on Gene Ontology (GO classification. We identified numerous, previously undescribed genes with potential roles in photoreception, pathogenesis, and the regulation of development as well as Zephyr, a novel, actively transcribed transposable element. Differential expression of selected genes was demonstrated by real-time PCR, supporting their proposed roles in vegetative, infectious, and reproductive growth. Conclusion Novel genes that are potentially involved in regulating growth, development, and pathogenesis were identified in C. zeae-maydis, providing specific targets for characterization by molecular genetics and functional genomics. The EST data establish a foundation for future studies in evolutionary and comparative genomics among species of Cercospora and other groups of plant pathogenic fungi.

  15. NHE-1 sequence and expression in toad, snake and fish red blood cells

    DEFF Research Database (Denmark)

    Thomsen, Steffen Nyegaard; Wang, Tobias; Kristensen, Torsten

    Red blood cells (RBC) from reptiles appear not to express regulatory volume increase (RVI) upon shrinkage (Kristensen et al., 2008). In other vertebrates, the RVI response is primarily mediated by activation of the Na+/H+ exchanger (NHE-1) and we, therefore decided to investigate whether red cells...... of reptiles express a different NHE-1 that responds less to volume activation compared to other vertebrates or simply lack the Na+/H+ exchanger. Using various tissues from the ball python (Python regius), Cane toad (Bufo marinus) and European perch (Perca fluviatilis), cDNA libraries were created...

  16. Cloning and heterologous expression of a gene encoding lycopene ...

    African Journals Online (AJOL)

    This report describes the cloning and expression of a gene lycopene epsilon cyclase, (LCYE) from Camellia sinensis var assamica which is a precursor of the carotenoid lutein in tea. The 1982 bp cDNA sequence with 1599 bp open reading frame of LCYE was identified from an SSH library constructed for quality trait in tea.

  17. Identification of microRNAs from Eugenia uniflora by high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Guzman, Frank; Almerão, Mauricio P; Körbes, Ana P; Loss-Morais, Guilherme; Margis, Rogerio

    2012-01-01

    microRNAs or miRNAs are small non-coding regulatory RNAs that play important functions in the regulation of gene expression at the post-transcriptional level by targeting mRNAs for degradation or inhibiting protein translation. Eugenia uniflora is a plant native to tropical America with pharmacological and ecological importance, and there have been no previous studies concerning its gene expression and regulation. To date, no miRNAs have been reported in Myrtaceae species. Small RNA and RNA-seq libraries were constructed to identify miRNAs and pre-miRNAs in Eugenia uniflora. Solexa technology was used to perform high throughput sequencing of the library, and the data obtained were analyzed using bioinformatics tools. From 14,489,131 small RNA clean reads, we obtained 1,852,722 mature miRNA sequences representing 45 conserved families that have been identified in other plant species. Further analysis using contigs assembled from RNA-seq allowed the prediction of secondary structures of 25 known and 17 novel pre-miRNAs. The expression of twenty-seven identified miRNAs was also validated using RT-PCR assays. Potential targets were predicted for the most abundant mature miRNAs in the identified pre-miRNAs based on sequence homology. This study is the first large scale identification of miRNAs and their potential targets from a species of the Myrtaceae family without genomic sequence resources. Our study provides more information about the evolutionary conservation of the regulatory network of miRNAs in plants and highlights species-specific miRNAs.

  18. Differential effects of simple repeating DNA sequences on gene expression from the SV40 early promoter.

    Science.gov (United States)

    Amirhaeri, S; Wohlrab, F; Wells, R D

    1995-02-17

    The influence of simple repeat sequences, cloned into different positions relative to the SV40 early promoter/enhancer, on the transient expression of the chloramphenicol acetyltransferase (CAT) gene was investigated. Insertion of (G)29.(C)29 in either orientation into the 5'-untranslated region of the CAT gene reduced expression in CV-1 cells 50-100 fold when compared with controls with random sequence inserts. Analysis of CAT-specific mRNA levels demonstrated that the effect was due to a reduction of CAT mRNA production rather than to posttranscriptional events. In contrast, insertion of the same insert in either orientation upstream of the promoter-enhancer or downstream of the gene stimulated gene expression 2-3-fold. These effects could be reversed by cotransfection of a competitor plasmid carrying (G)25.(C)25 sequences. The results suggest that a G.C-binding transcription factor modulates gene expression in this system and that promoter strength can be regulated by providing protein-binding sites in trans. Although constructs containing longer tracts of alternating (C-G), (T-G), or (A-T) sequences inhibited CAT expression when inserted in the 5'-untranslated region of the CAT gene, the amount of CAT mRNA was unaffected. Hence, these inhibitions must be due to posttranscriptional events, presumably at the level of translation. These effects of microsatellite sequences on gene expression are discussed with respect to recent data on related simple repeat sequences which cause several human genetic diseases.

  19. Next-Generation DNA Sequencing of VH/VL Repertoires: A Primer and Guide to Applications in Single-Domain Antibody Discovery.

    Science.gov (United States)

    Henry, Kevin A

    2018-01-01

    Immunogenetic analyses of expressed antibody repertoires are becoming increasingly common experimental investigations and are critical to furthering our understanding of autoimmunity, infectious disease, and cancer. Next-generation DNA sequencing (NGS) technologies have now made it possible to interrogate antibody repertoires to unprecedented depths, typically by sequencing of cDNAs encoding immunoglobulin variable domains. In this chapter, we describe simple, fast, and reliable methods for producing and sequencing multiplex PCR amplicons derived from the variable regions (V H , V H H or V L ) of rearranged immunoglobulin heavy and light chain genes using the Illumina MiSeq platform. We include complete protocols and primer sets for amplicon sequencing of V H /V H H/V L repertoires directly from human, mouse, and llama lymphocytes as well as from phage-displayed V H /V H H/V L libraries; these can be easily be adapted to other types of amplicons with little modification. The resulting amplicons are diverse and representative, even using as few as 10 3 input B cells, and their generation is relatively inexpensive, requiring no special equipment and only a limited set of primers. In the absence of heavy-light chain pairing, single-domain antibodies are uniquely amenable to NGS analyses. We present a number of applications of NGS technology useful in discovery of single-domain antibodies from phage display libraries, including: (i) assessment of library functionality; (ii) confirmation of desired library randomization; (iii) estimation of library diversity; and (iv) monitoring the progress of panning experiments. While the case studies presented here are of phage-displayed single-domain antibody libraries, the principles extend to other types of in vitro display libraries.

  20. Survey of transposable elements in sugarcane expressed sequence tags (ESTs

    Directory of Open Access Journals (Sweden)

    Rossi Magdalena

    2001-01-01

    Full Text Available The sugarcane expressed sequence tag (SUCEST project has produced a large number of cDNA sequences from several plant tissues submitted or not to different conditions of stress. In this paper we report the result of a search for transposable elements (TEs revealing a surprising amount of expressed TEs homologues. Of the 260,781 sequences grouped in 81,223 fragment assembly program (Phrap clusters, a total of 276 clones showed homology to previously reported TEs using a stringent cut-off value of e-50 or better. Homologous clones to Copia/Ty1 and Gypsy/Ty3 groups of long terminal repeat (LTR retrotransposons were found but no non-LTR retroelements were identified. All major transposon families were represented in sugarcane including Activator (Ac, Mutator (MuDR, Suppressor-mutator (En/Spm and Mariner. In order to compare the TE diversity in grasses genomes, we carried out a search for TEs described in sugarcane related species O.sativa, Z. mays and S. bicolor. We also present preliminary results showing the potential use of TEs insertion pattern polymorphism as molecular markers for cultivar identification.

  1. Cloning, annotation and expression analysis of mycoparasitism-related genes in Trichoderma harzianum 88.

    Science.gov (United States)

    Yao, Lin; Yang, Qian; Song, Jinzhu; Tan, Chong; Guo, Changhong; Wang, Li; Qu, Lianhai; Wang, Yun

    2013-04-01

    Trichoderma harzianum 88, a filamentous soil fungus, is an effective biocontrol agent against several plant pathogens. High-throughput sequencing was used here to study the mycoparasitism mechanisms of T. harzianum 88. Plate confrontation tests of T. harzianum 88 against plant pathogens were conducted, and a cDNA library was constructed from T. harzianum 88 mycelia in the presence of plant pathogen cell walls. Randomly selected transcripts from the cDNA library were compared with eukaryotic plant and fungal genomes. Of the 1,386 transcripts sequenced, the most abundant Gene Ontology (GO) classification group was "physiological process". Differential expression of 19 genes was confirmed by real-time RT-PCR at different mycoparasitism stages against plant pathogens. Gene expression analysis revealed the transcription of various genes involved in mycoparasitism of T. harzianum 88. Our study provides helpful insights into the mechanisms of T. harzianum 88-plant pathogen interactions.

  2. Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations.

    Directory of Open Access Journals (Sweden)

    Brian B Tuch

    Full Text Available Due to growing throughput and shrinking cost, massively parallel sequencing is rapidly becoming an attractive alternative to microarrays for the genome-wide study of gene expression and copy number alterations in primary tumors. The sequencing of transcripts (RNA-Seq should offer several advantages over microarray-based methods, including the ability to detect somatic mutations and accurately measure allele-specific expression. To investigate these advantages we have applied a novel, strand-specific RNA-Seq method to tumors and matched normal tissue from three patients with oral squamous cell carcinomas. Additionally, to better understand the genomic determinants of the gene expression changes observed, we have sequenced the tumor and normal genomes of one of these patients. We demonstrate here that our RNA-Seq method accurately measures allelic imbalance and that measurement on the genome-wide scale yields novel insights into cancer etiology. As expected, the set of genes differentially expressed in the tumors is enriched for cell adhesion and differentiation functions, but, unexpectedly, the set of allelically imbalanced genes is also enriched for these same cancer-related functions. By comparing the transcriptomic perturbations observed in one patient to his underlying normal and tumor genomes, we find that allelic imbalance in the tumor is associated with copy number mutations and that copy number mutations are, in turn, strongly associated with changes in transcript abundance. These results support a model in which allele-specific deletions and duplications drive allele-specific changes in gene expression in the developing tumor.

  3. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  4. RNA-ID, a highly sensitive and robust method to identify cis-regulatory sequences using superfolder GFP and a fluorescence-based assay.

    Science.gov (United States)

    Dean, Kimberly M; Grayhack, Elizabeth J

    2012-12-01

    We have developed a robust and sensitive method, called RNA-ID, to screen for cis-regulatory sequences in RNA using fluorescence-activated cell sorting (FACS) of yeast cells bearing a reporter in which expression of both superfolder green fluorescent protein (GFP) and yeast codon-optimized mCherry red fluorescent protein (RFP) is driven by the bidirectional GAL1,10 promoter. This method recapitulates previously reported progressive inhibition of translation mediated by increasing numbers of CGA codon pairs, and restoration of expression by introduction of a tRNA with an anticodon that base pairs exactly with the CGA codon. This method also reproduces effects of paromomycin and context on stop codon read-through. Five key features of this method contribute to its effectiveness as a selection for regulatory sequences: The system exhibits greater than a 250-fold dynamic range, a quantitative and dose-dependent response to known inhibitory sequences, exquisite resolution that allows nearly complete physical separation of distinct populations, and a reproducible signal between different cells transformed with the identical reporter, all of which are coupled with simple methods involving ligation-independent cloning, to create large libraries. Moreover, we provide evidence that there are sequences within a 9-nt library that cause reduced GFP fluorescence, suggesting that there are novel cis-regulatory sequences to be found even in this short sequence space. This method is widely applicable to the study of both RNA-mediated and codon-mediated effects on expression.

  5. Molecular characterization, sequence analysis and tissue expression of a porcine gene – MOSPD2

    Directory of Open Access Journals (Sweden)

    Yang Jie

    2017-01-01

    Full Text Available The full-length cDNA sequence of a porcine gene, MOSPD2, was amplified using the rapid amplification of cDNA ends method based on a pig expressed sequence tag sequence which was highly homologous to the coding sequence of the human MOSPD2 gene. Sequence prediction analysis revealed that the open reading frame of this gene encodes a protein of 491 amino acids that has high homology with the motile sperm domain-containing protein 2 (MOSPD2 of five species: horse (89%, human (90%, chimpanzee (89%, rhesus monkey (89% and mouse (85%; thus, it could be defined as a porcine MOSPD2 gene. This novel porcine gene was assigned GeneID: 100153601. This gene is structured in 15 exons and 14 introns as revealed by computer-assisted analysis. The phylogenetic analysis revealed that the porcine MOSPD2 gene has a closer genetic relationship with the MOSPD2 gene of horse. Tissue expression analysis indicated that the porcine MOSPD2 gene is generally and differentially expressed in the spleen, muscle, skin, kidney, lung, liver, fat and heart. Our experiment is the first to establish the primary foundation for further research on the porcine MOSPD2 gene.

  6. Sequence and expression analysis of gaps in human chromosome 20

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Seemann, Stefan; Mang, Yuan

    2012-01-01

    /or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing......The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and...... and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum...

  7. Characterization and comparative analysis of small RNAs in three small RNA libraries of the brown planthopper (Nilaparvata lugens.

    Directory of Open Access Journals (Sweden)

    Qiuhong Chen

    Full Text Available BACKGROUND: The brown planthopper (BPH, Nilaparvata lugens (Stå;l, which belongs to Homopteran, Delphacidae, is one of the most serious and destructive pests of rice. Feeding BPH with homologous dsRNA in vitro can lead to the death of BPH, which gives a valuable clue to the prevention and control of this pest, however, we know little about its small RNA world. METHODOLOGY/PRINCIPAL FINDINGS: Small RNA libraries for three developmental stages of BPH (CX-male adult, CC-female adult, CY-last instar female nymph had been constructed and sequenced. It revealed a prolific small RNA world of BPH. We obtained a final list of 452 (CX, 430 (CC, and 381 (CY conserved microRNAs (miRNAs, respectively, as well as a total of 71 new miRNAs in the three libraries. All the miRNAs had their own expression profiles in the three libraries. The phylogenic evolution of the miRNA families in BPH was consistent with other species. The new miRNA sequences demonstrated some base biases. CONCLUSION: Our study discovered a large number of small RNAs through deep sequencing of three small RNA libraries of BPH. Many animal-conserved miRNA families as well as some novel miRNAs have been detected in our libraries. This is the first achievement to discover the small RNA world of BPH. A lot of new valuable information about BPH small RNAs has been revealed which was helpful for studying insect molecular biology and insect resistant research.

  8. Impacts of Neanderthal-Introgressed Sequences on the Landscape of Human Gene Expression.

    Science.gov (United States)

    McCoy, Rajiv C; Wakefield, Jon; Akey, Joshua M

    2017-02-23

    Regulatory variation influencing gene expression is a key contributor to phenotypic diversity, both within and between species. Unfortunately, RNA degrades too rapidly to be recovered from fossil remains, limiting functional genomic insights about our extinct hominin relatives. Many Neanderthal sequences survive in modern humans due to ancient hybridization, providing an opportunity to assess their contributions to transcriptional variation and to test hypotheses about regulatory evolution. We developed a flexible Bayesian statistical approach to quantify allele-specific expression (ASE) in complex RNA-seq datasets. We identified widespread expression differences between Neanderthal and modern human alleles, indicating pervasive cis-regulatory impacts of introgression. Brain regions and testes exhibited significant downregulation of Neanderthal alleles relative to other tissues, consistent with natural selection influencing the tissue-specific regulatory landscape. Our study demonstrates that Neanderthal-inherited sequences are not silent remnants of ancient interbreeding but have measurable impacts on gene expression that contribute to variation in modern human phenotypes. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Long-Term Protective Immune Response Elicited by Vaccination with an Expression Genomic Library of Toxoplasma gondii

    OpenAIRE

    Fachado, Alberto; Rodriguez, Alexandro; Molina, Judith; Silvério, Jaline C.; Marino, Ana P. M. P.; Pinto, Luzia M. O.; Angel, Sergio O.; Infante, Juan F.; Traub-Cseko, Yara; Amendoeira, Regina R.; Lannes-Vieira, Joseli

    2003-01-01

    Immunization of BALB/c mice with an expression genomic library of Toxoplasma gondii induces a Th1-type immune response, with recognition of several T. gondii proteins (21 to 117 kDa) and long-term protective immunity against a lethal challenge. These results support further investigations to achieve a multicomponent anti-T. gondii DNA vaccine.

  10. Analysis of simple sequence repeats in rice bean (Vigna umbellata using an SSR-enriched library

    Directory of Open Access Journals (Sweden)

    Lixia Wang

    2016-02-01

    Full Text Available Rice bean (Vigna umbellata Thunb., a warm-season annual legume, is grown in Asia mainly for dried grain or fodder and plays an important role in human and animal nutrition because the grains are rich in protein and some essential fatty acids and minerals. With the aim of expediting the genetic improvement of rice bean, we initiated a project to develop genomic resources and tools for molecular breeding in this little-known but important crop. Here we report the construction of an SSR-enriched genomic library from DNA extracted from pooled young leaf tissues of 22 rice bean genotypes and developing SSR markers. In 433,562 reads generated by a Roche 454 GS-FLX sequencer, we identified 261,458 SSRs, of which 48.8% were of compound form. Dinucleotide repeats were predominant with an absolute proportion of 81.6%, followed by trinucleotides (17.8%. Other types together accounted for 0.6%. The motif AC/GT accounted for 77.7% of the total, followed by AAG/CTT (14.3%, and all others accounted for 12.0%. Among the flanking sequences, 2928 matched putative genes or gene models in the protein database of Arabidopsis thaliana, corresponding with 608 non-redundant Gene Ontology terms. Of these sequences, 11.2% were involved in cellular components, 24.2% were involved molecular functions, and 64.6% were associated with biological processes. Based on homolog analysis, 1595 flanking sequences were similar to mung bean and 500 to common bean genomic sequences. Comparative mapping was conducted using 350 sequences homologous to both mung bean and common bean sequences. Finally, a set of primer pairs were designed, and a validation test showed that 58 of 220 new primers can be used in rice bean and 53 can be transferred to mung bean. However, only 11 were polymorphic when tested on 32 rice bean varieties. We propose that this study lays the groundwork for developing novel SSR markers and will enhance the mapping of qualitative and quantitative traits and marker

  11. Effect of ATRX and G-Quadruplex Formation by the VNTR Sequence on α-Globin Gene Expression.

    Science.gov (United States)

    Li, Yue; Syed, Junetha; Suzuki, Yuki; Asamitsu, Sefan; Shioda, Norifumi; Wada, Takahito; Sugiyama, Hiroshi

    2016-05-17

    ATR-X (α-thalassemia/mental retardation X-linked) syndrome is caused by mutations in chromatin remodeler ATRX. ATRX can bind the variable number of tandem repeats (VNTR) sequence in the promoter region of the α-globin gene cluster. The VNTR sequence, which contains the potential G-quadruplex-forming sequence CGC(GGGGCGGGG)n , is involved in the downregulation of α-globin expression. We investigated G-quadruplex and i-motif formation in single-stranded DNA and long double-stranded DNA. The promoter region without the VNTR sequence showed approximately twofold higher luciferase activity than the promoter region harboring the VNTR sequence. G-quadruplex stabilizers hemin and TMPyP4 reduced the luciferase activity, whereas expression of ATRX led to a recovery in reporter activity. Our results demonstrate that stable G-quadruplex formation by the VNTR sequence downregulates the expression of α-globin genes and that ATRX might bind to and resolve the G-quadruplex. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Evidence of accelerated evolution and ectodermal-specific expression of presumptive BDS toxin cDNAs from Anemonia viridis.

    Science.gov (United States)

    Nicosia, Aldo; Maggio, Teresa; Mazzola, Salvatore; Cuttitta, Angela

    2013-10-30

    Anemonia viridis is a widespread and extensively studied Mediterranean species of sea anemone from which a large number of polypeptide toxins, such as blood depressing substances (BDS) peptides, have been isolated. The first members of this class, BDS-1 and BDS-2, are polypeptides belonging to the β-defensin fold family and were initially described for their antihypertensive and antiviral activities. BDS-1 and BDS-2 are 43 amino acid peptides characterised by three disulfide bonds that act as neurotoxins affecting Kv3.1, Kv3.2 and Kv3.4 channel gating kinetics. In addition, BDS-1 inactivates the Nav1.7 and Nav1.3 channels. The development of a large dataset of A. viridis expressed sequence tags (ESTs) and the identification of 13 putative BDS-like cDNA sequences has attracted interest, especially as scientific and diagnostic tools. A comparison of BDS cDNA sequences showed that the untranslated regions are more conserved than the protein-coding regions. Moreover, the KA/KS ratios calculated for all pairwise comparisons showed values greater than 1, suggesting mechanisms of accelerated evolution. The structures of the BDS homologs were predicted by molecular modelling. All toxins possess similar 3D structures that consist of a triple-stranded antiparallel β-sheet and an additional small antiparallel β-sheet located downstream of the cleavage/maturation site; however, the orientation of the triple-stranded β-sheet appears to differ among the toxins. To characterise the spatial expression profile of the putative BDS cDNA sequences, tissue-specific cDNA libraries, enriched for BDS transcripts, were constructed. In addition, the proper amplification of ectodermal or endodermal markers ensured the tissue specificity of each library. Sequencing randomly selected clones from each library revealed ectodermal-specific expression of ten BDS transcripts, while transcripts of BDS-8, BDS-13, BDS-14 and BDS-15 failed to be retrieved, likely due to under-representation in our

  13. Evidence of Accelerated Evolution and Ectodermal-Specific Expression of Presumptive BDS Toxin cDNAs from Anemonia viridis

    Directory of Open Access Journals (Sweden)

    Aldo Nicosia

    2013-10-01

    Full Text Available Anemonia viridis is a widespread and extensively studied Mediterranean species of sea anemone from which a large number of polypeptide toxins, such as blood depressing substances (BDS peptides, have been isolated. The first members of this class, BDS-1 and BDS-2, are polypeptides belonging to the β-defensin fold family and were initially described for their antihypertensive and antiviral activities. BDS-1 and BDS-2 are 43 amino acid peptides characterised by three disulfide bonds that act as neurotoxins affecting Kv3.1, Kv3.2 and Kv3.4 channel gating kinetics. In addition, BDS-1 inactivates the Nav1.7 and Nav1.3 channels. The development of a large dataset of A. viridis expressed sequence tags (ESTs and the identification of 13 putative BDS-like cDNA sequences has attracted interest, especially as scientific and diagnostic tools. A comparison of BDS cDNA sequences showed that the untranslated regions are more conserved than the protein-coding regions. Moreover, the KA/KS ratios calculated for all pairwise comparisons showed values greater than 1, suggesting mechanisms of accelerated evolution. The structures of the BDS homologs were predicted by molecular modelling. All toxins possess similar 3D structures that consist of a triple-stranded antiparallel β-sheet and an additional small antiparallel β-sheet located downstream of the cleavage/maturation site; however, the orientation of the triple-stranded β-sheet appears to differ among the toxins. To characterise the spatial expression profile of the putative BDS cDNA sequences, tissue-specific cDNA libraries, enriched for BDS transcripts, were constructed. In addition, the proper amplification of ectodermal or endodermal markers ensured the tissue specificity of each library. Sequencing randomly selected clones from each library revealed ectodermal-specific expression of ten BDS transcripts, while transcripts of BDS-8, BDS-13, BDS-14 and BDS-15 failed to be retrieved, likely due to under

  14. Structure and expression of human dihydropteridine reductase

    International Nuclear Information System (INIS)

    Lockyer, J.; Cook, R.G.; Milstien, S.; Kaufman, S.; Woo, S.L.C.; Ledley, F.D.

    1987-01-01

    Dihydropteridine reductase catalyzes the NADH-mediated reduction of quinonoid dihydrobiopterin and is an essential component of the pterindependent aromatic amino acid hydroxylating systems. A cDNA for human DHPR was isolated from a human liver cDNA library in the vector λgt11 using a monospecific antibody against sheep DHPR. The nucleic acid sequence and amino acid sequence of human DHPR were determined from a full-length clone. A 112 amino acid sequence of sheep DHPR was obtained by sequencing purified sheep DHPR. This sequence is highly homologous to the predicted amino acid sequence of the human protein. Gene transfer of the recombinant human DHPR into COS cells leads to expression of DHPR enzymatic activity. These results indicate that the cDNA clone identified by antibody screening is an authentic and full-length cDNA for human DHPR

  15. Identification and functional characterization of effectors in expressed sequence tags from various life cycle stages of the potato cyst nematode Globodera pallida.

    Science.gov (United States)

    Jones, John T; Kumar, Amar; Pylypenko, Liliya A; Thirugnanasambandam, Amarnath; Castelli, Lydia; Chapman, Sean; Cock, Peter J A; Grenier, Eric; Lilley, Catherine J; Phillips, Mark S; Blok, Vivian C

    2009-11-01

    In this article, we describe the analysis of over 9000 expressed sequence tags (ESTs) from cDNA libraries obtained from various life cycle stages of Globodera pallida. We have identified over 50 G. pallida effectors from this dataset using bioinformatics analysis, by screening clones in order to identify secreted proteins up-regulated after the onset of parasitism and using in situ hybridization to confirm the expression in pharyngeal gland cells. A substantial gene family encoding G. pallida SPRYSEC proteins has been identified. The expression of these genes is restricted to the dorsal pharyngeal gland cell. Different members of the SPRYSEC family of proteins from G. pallida show different subcellular localization patterns in plants, with some localized to the cytoplasm and others to the nucleus and nucleolus. Differences in subcellular localization may reflect diverse functional roles for each individual protein or, more likely, variety in the compartmentalization of plant proteins targeted by the nematode. Our data are therefore consistent with the suggestion that the SPRYSEC proteins suppress host defences, as suggested previously, and that they achieve this through interaction with a range of host targets.

  16. Clinical Application of Picodroplet Digital PCR Technology for Rapid Detection of EGFR T790M in Next-Generation Sequencing Libraries and DNA from Limited Tumor Samples.

    Science.gov (United States)

    Borsu, Laetitia; Intrieri, Julie; Thampi, Linta; Yu, Helena; Riely, Gregory; Nafa, Khedoudja; Chandramohan, Raghu; Ladanyi, Marc; Arcila, Maria E

    2016-11-01

    Although next-generation sequencing (NGS) is a robust technology for comprehensive assessment of EGFR-mutant lung adenocarcinomas with acquired resistance to tyrosine kinase inhibitors, it may not provide sufficiently rapid and sensitive detection of the EGFR T790M mutation, the most clinically relevant resistance biomarker. Here, we describe a digital PCR (dPCR) assay for rapid T790M detection on aliquots of NGS libraries prepared for comprehensive profiling, fully maximizing broad genomic analysis on limited samples. Tumor DNAs from patients with EGFR-mutant lung adenocarcinomas and acquired resistance to epidermal growth factor receptor inhibitors were prepared for Memorial Sloan-Kettering-Integrated Mutation Profiling of Actionable Cancer Targets sequencing, a hybrid capture-based assay interrogating 410 cancer-related genes. Precapture library aliquots were used for rapid EGFR T790M testing by dPCR, and results were compared with NGS and locked nucleic acid-PCR Sanger sequencing (reference high sensitivity method). Seventy resistance samples showed 99% concordance with the reference high sensitivity method in accuracy studies. Input as low as 2.5 ng provided a sensitivity of 1% and improved further with increasing DNA input. dPCR on libraries required less DNA and showed better performance than direct genomic DNA. dPCR on NGS libraries is a robust and rapid approach to EGFR T790M testing, allowing most economical utilization of limited material for comprehensive assessment. The same assay can also be performed directly on any limited DNA source and cell-free DNA. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  17. Reliability analysis of the Ahringer Caenorhabditis elegans RNAi feeding library: a guide for genome-wide screens

    Directory of Open Access Journals (Sweden)

    Lu Yiming

    2011-03-01

    Full Text Available Abstract Background The Ahringer C. elegans RNAi feeding library prepared by cloning genomic DNA fragments has been widely used in genome-wide analysis of gene function. However, the library has not been thoroughly validated by direct sequencing, and there are potential errors, including: 1 mis-annotation (the clone with the retired gene name should be remapped to the actual target gene; 2 nonspecific PCR amplification; 3 cross-RNAi; 4 mis-operation such as sample loading error, etc. Results Here we performed a reliability analysis on the Ahringer C. elegans RNAi feeding library, which contains 16,256 bacterial strains, using a bioinformatics approach. Results demonstrated that most (98.3% of the bacterial strains in the library are reliable. However, we also found that 2,851 (17.54% bacterial strains need to be re-annotated even they are reliable. Most of these bacterial strains are the clones having the retired gene names. Besides, 28 strains are grouped into unreliable category and 226 strains are marginal because of probably expressing unrelated double-stranded RNAs (dsRNAs. The accuracy of the prediction was further confirmed by direct sequencing analysis of 496 bacterial strains. Finally, a freely accessible database named CelRNAi (http://biocompute.bmi.ac.cn/CelRNAi/ was developed as a valuable complement resource for the feeding RNAi library by providing the predicted information on all bacterial strains. Moreover, submission of the direct sequencing result or any other annotations for the bacterial strains to the database are allowed and will be integrated into the CelRNAi database to improve the accuracy of the library. In addition, we provide five candidate primer sets for each of the unreliable and marginal bacterial strains for users to construct an alternative vector for their own RNAi studies. Conclusions Because of the potential unreliability of the Ahringer C. elegans RNAi feeding library, we strongly suggest the user examine

  18. Replication error deficient and proficient colorectal cancer gene expression differences caused by 3'UTR polyT sequence deletions

    DEFF Research Database (Denmark)

    Wilding, Jennifer L; McGowan, Simon; Liu, Ying

    2010-01-01

    , and have distinct pathologies. Regulatory sequences controlling all aspects of mRNA processing, especially including message stability, are found in the 3'UTR sequence of most genes. The relevant sequences are typically A/U-rich elements or U repeats. Microarray analysis of 14 RER+ (deficient) and 16 RER......- (proficient) colorectal cancer cell lines confirms a striking difference in expression profiles. Analysis of the incidence of mononucleotide repeat sequences in the 3'UTRs, 5'UTRs, and coding sequences of those genes most differentially expressed in RER+ versus RER- cell lines has shown that much...... of this differential expression can be explained by the occurrence of a massive enrichment of genes with 3'UTR T repeats longer than 11 base pairs in the most differentially expressed genes. This enrichment was confirmed by analysis of two published consensus sets of RER differentially expressed probesets for a large...

  19. Differential Gene Expression in Ovaries of Qira Black Sheep and Hetian Sheep Using RNA-Seq Technique

    Science.gov (United States)

    Jia, Bin; Zhang, Yong Sheng; Wang, Xu Hai; Zeng, Xian Cun

    2015-01-01

    The Qira black sheep and the Hetian sheep are two local breeds in the Northwest of China, which are characterized by high-fecundity and low-fecundity breed respectively. The elucidation of mRNA expression profiles in the ovaries among different sheep breeds representing fecundity extremes will helpful for identification and utilization of major prolificacy genes in sheep. In the present study, we performed RNA-seq technology to compare the difference in ovarian mRNA expression profiles between Qira black sheep and Hetian sheep. From the Qira black sheep and the Hetian sheep libraries, we obtained a total of 11,747,582 and 11,879,968 sequencing reads, respectively. After aligning to the reference sequences, the two libraries included 16,763 and 16,814 genes respectively. A total of 1,252 genes were significantly differentially expressed at Hetian sheep compared with Qira black sheep. Eight differentially expressed genes were randomly selected for validation by real-time RT-PCR. This study provides a basic data for future research of the sheep reproduction. PMID:25790350

  20. Differential gene expression in ovaries of Qira black sheep and Hetian sheep using RNA-Seq technique.

    Directory of Open Access Journals (Sweden)

    Han Ying Chen

    Full Text Available The Qira black sheep and the Hetian sheep are two local breeds in the Northwest of China, which are characterized by high-fecundity and low-fecundity breed respectively. The elucidation of mRNA expression profiles in the ovaries among different sheep breeds representing fecundity extremes will helpful for identification and utilization of major prolificacy genes in sheep. In the present study, we performed RNA-seq technology to compare the difference in ovarian mRNA expression profiles between Qira black sheep and Hetian sheep. From the Qira black sheep and the Hetian sheep libraries, we obtained a total of 11,747,582 and 11,879,968 sequencing reads, respectively. After aligning to the reference sequences, the two libraries included 16,763 and 16,814 genes respectively. A total of 1,252 genes were significantly differentially expressed at Hetian sheep compared with Qira black sheep. Eight differentially expressed genes were randomly selected for validation by real-time RT-PCR. This study provides a basic data for future research of the sheep reproduction.

  1. Bacterial diversity analysis of Huanglongbing pathogen-infected citrus, using PhyloChip and 16S rRNA gene clone library sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Shankar Sagaram, U.; DeAngelis, K.M.; Trivedi, P.; Andersen, G.L.; Lu, S.-E.; Wang, N.

    2009-03-01

    The bacterial diversity associated with citrus leaf midribs was characterized 1 from citrus groves that contained the Huanglongbing (HLB) pathogen, which has yet to be cultivated in vitro. We employed a combination of high-density phylogenetic 16S rDNA microarray and 16S rDNA clone library sequencing to determine the microbial community composition of symptomatic and asymptomatic citrus midribs. Our results revealed that citrus leaf midribs can support a diversity of microbes. PhyloChip analysis indicated that 47 orders of bacteria from 15 phyla were present in the citrus leaf midribs while 20 orders from phyla were observed with the cloning and sequencing method. PhyloChip arrays indicated that nine taxa were significantly more abundant in symptomatic midribs compared to asymptomatic midribs. Candidatus Liberibacter asiaticus (Las) was detected at a very low level in asymptomatic plants, but was over 200 times more abundant in symptomatic plants. The PhyloChip analysis was further verified by sequencing 16S rDNA clone libraries, which indicated the dominance of Las in symptomatic leaves. These data implicate Las as the pathogen responsible for HLB disease. Citrus is the most important commercial fruit crop in Florida. In recent years, citrus Huanglongbing (HLB), also called citrus greening, has severely affected Florida's citrus production and hence has drawn an enormous amount of attention. HLB is one of the most devastating diseases of citrus (6,13), characterized by blotchy mottling with green islands on leaves, as well as stunting, fruit decline, and small, lopsided fruits with poor coloration. The disease tends to be associated with a phloem-limited fastidious {alpha}-proteobacterium given a provisional Candidatus status (Candidatus Liberobacter spp. later changed to Candidatus Liberibacter spp.) in nomenclature (18,25,34). Previous studies indicate that HLB infection causes disorder in the phloem and severely impairs the translocation of assimilates in

  2. Transcriptome profiling and digital gene expression by deep sequencing in early somatic embryogenesis of endangered medicinal Eleutherococcus senticosus Maxim.

    Science.gov (United States)

    Tao, Lei; Zhao, Yue; Wu, Ying; Wang, Qiuyu; Yuan, Hongmei; Zhao, Lijuan; Guo, Wendong; You, Xiangling

    2016-03-01

    Somatic embryogenesis (SE) has been studied as a model system to understand molecular events in physiology, biochemistry, and cytology during plant embryo development. In particular, it is exceedingly difficult to access the morphological and early regulatory events in zygotic embryos. To understand the molecular mechanisms regulating early SE in Eleutherococcus senticosus Maxim., we used high-throughput RNA-Seq technology to investigate its transcriptome. We obtained 58,327,688 reads, which were assembled into 75,803 unique unigenes. To better understand their functions, the unigenes were annotated using the Clusters of Orthologous Groups, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes databases. Digital gene expression libraries revealed differences in gene expression profiles at different developmental stages (embryogenic callus, yellow embryogenic callus, global embryo). We obtained a sequencing depth of >5.6 million tags per sample and identified many differentially expressed genes at various stages of SE. The initiation of SE affected gene expression in many KEGG pathways, but predominantly that in metabolic pathways, biosynthesis of secondary metabolites, and plant hormone signal transduction. This information on the changes in the multiple pathways related to SE induction in E. senticosus Maxim. embryogenic tissue will contribute to a more comprehensive understanding of the mechanisms involved in early SE. Additionally, the differentially expressed genes may act as molecular markers and could play very important roles in the early stage of SE. The results are a comprehensive molecular biology resource for investigating SE of E. senticosus Maxim. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    Science.gov (United States)

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  4. GuiTope: an application for mapping random-sequence peptides to protein sequences

    Directory of Open Access Journals (Sweden)

    Halperin Rebecca F

    2012-01-01

    Full Text Available Abstract Background Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. Results GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. Conclusions GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  5. Analyzing Plasmodium falciparum erythrocyte membrane protein 1 gene expression by a next generation sequencing based method

    DEFF Research Database (Denmark)

    Jespersen, Jakob S.; Petersen, Bent; Seguin-Orlando, Andaine

    2013-01-01

    at identifying PfEMP1 features associated with high virulence. Here we present the first effective method for sequence analysis of var genes expressed in field samples: a sequential PCR and next generation sequencing based technique applied on expressed var sequence tags and subsequently on long range PCR......, encoded by ~60 highly variable 'var' genes per haploid genome. PfEMP1 is exported to the surface of infected erythrocytes and is thought to be fundamental to immune evasion by adhesion to host and parasite factors. The highly variable nature has constituted a roadblock in var expression studies aimed...

  6. Branchial Expression Patterns of Claudin Isoforms in Atlantic Salmon During Seawater Acclimation and Smoltification

    DEFF Research Database (Denmark)

    Tipsmark, Christian K; Kiilerich, Pia; Nilsen, Tom O

    2008-01-01

    in epithelia. We identified Atlantic salmon genes belonging to the claudin family by screening expressed sequence tag libraries available at NCBI and classification was performed with aid of maximum likelihood and neighbour-joining analysis. In gill libraries, five isoforms (10e, 27a, 28a, 28b and 30) were...... present and QPCR analysis confirmed tissue-specific expression in gill when compared to kidney, intestine, heart, muscle, brain and liver. Expression patterns during acclimation of freshwater salmon to seawater (SW) and during the smoltification process were examined. Acclimation to SW reduced...... induced no significant changes in expression of the other isoforms. This study demonstrates the expression of an array of salmon claudin isoforms and shows that SW acclimation involves inverse regulation, in the gill, of claudin 10e versus claudin 27a and 30. It is possible, that claudin 10e...

  7. Preliminary exploration and thought of promoting library science Indigenization

    International Nuclear Information System (INIS)

    Liu Wenping; Du Jingling

    2014-01-01

    The article explains the significance of Library Science Indigenization, Answer some misunderstanding of Library Science Indigenization,reveals express form of Library Science Indigenization, Discusses criteria of Library Science Indigenization, finally give some suggestions and methods of Library Science Indigenization. (authors)

  8. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  9. Optimized Exon-Exon Junction Library and its Application on Rodents' Brain Transcriptome Analysis

    Directory of Open Access Journals (Sweden)

    Tong-Hai Dou

    2017-05-01

    Full Text Available ABSTRACT Background: Alternative splicing (AS, which plays an important role in gene expression and functional regulation, has been analyzed on genome-scale by various bioinformatic approaches based on RNA-seq data. Compared with the huge number of studies on mouse, the AS researches approaching the rat, whose genome is intermedia between mouse and human, were still limited. To enrich the knowledge on AS events in rodents' brain, we perfomed a comprehensive analysis on four transcriptome libraries (mouse cerebrum, mouse cerebellum, rat cerebrum, and rat cerebellum, recruiting high-throughput sequencing technology. An optimized exon-exon junction library approach was introduced to adapt the longer RNA-seq reads and to improve mapping efficiency. Results: In total, 7,106 mouse genes and 2,734 rat genes were differentially expressed between cerebrum and cerebellum, while 7,125 mouse genes and 1,795 rat genes exhibited varieties on transcript variant level. Only half of the differentially expressed exon-exon junctions could be reflected at gene expression level. Functional cluster analysis showed that 32 pathways in mouse and 9 pathways in rat were significantly enriched, and 6 of them were in both. Interestingly, some differentially expressed transcript variants did not show difference on gene expression level, such as PLCβ1 and Kcnma1. Conclusion: Our work provided a case study of a novel exon-exon junction strategy to analyze the expression of genes and isoforms, helping us understand which transcript contributes to the overall expression and further functional change.

  10. CDNA library from the Latex of Hevea brasiliensis

    Directory of Open Access Journals (Sweden)

    Wilaiwan Chotigeat

    2010-12-01

    Full Text Available Latex from Hevea brasiliensis contains 30-50% (w/w of natural rubber (cis-1,4-polyisoprene, the important rawmaterial for many rubber industries. We have constructed a cDNA library from the latex of H. brasiliensis to investigate theexpressed genes and molecular events in the latex. We analyzed 412 expressed sequence tags (ESTs. More than 90% of theEST clones showed homology to previously described sequences in public databases. Functional classification of the ESTsshowed that the largest category were proteins of unknown function (30.1%, 11.4% of ESTs encoded for rubber synthesisrelatedproteins (RS and 8.5% for defense or stress related proteins (DS. Those with no significant homology to knownsequences (NSH accounted for 8.7%, primary metabolism (PM and gene expression and RNA metabolism were 7.8% and6.6%, respectively. Other categories included, protein synthesis-related proteins (6.6%, chromatin and DNA metabolism(CDM 3.9%, energy metabolism (EM 3.4%, cellular transport (CT 3.2%, cell structure (CS 3.2%, signal transduction (ST2.2%, secondary metabolism (SM 1.7%, protein fate (PF 2.2%, and reproductive proteins (RP 0.7%.

  11. dictyExpress: a web-based platform for sequence data management and analytics in Dictyostelium and beyond.

    Science.gov (United States)

    Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz

    2017-06-02

    Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open

  12. Peanut (Arachis hypogaea Expressed Sequence Tag Project: Progress and Application

    Directory of Open Access Journals (Sweden)

    Suping Feng

    2012-01-01

    Full Text Available Many plant ESTs have been sequenced as an alternative to whole genome sequences, including peanut because of the genome size and complexity. The US peanut research community had the historic 2004 Atlanta Genomics Workshop and named the EST project as a main priority. As of August 2011, the peanut research community had deposited 252,832 ESTs in the public NCBI EST database, and this resource has been providing the community valuable tools and core foundations for various genome-scale experiments before the whole genome sequencing project. These EST resources have been used for marker development, gene cloning, microarray gene expression and genetic map construction. Certainly, the peanut EST sequence resources have been shown to have a wide range of applications and accomplished its essential role at the time of need. Then the EST project contributes to the second historic event, the Peanut Genome Project 2010 Inaugural Meeting also held in Atlanta where it was decided to sequence the entire peanut genome. After the completion of peanut whole genome sequencing, ESTs or transcriptome will continue to play an important role to fill in knowledge gaps, to identify particular genes and to explore gene function.

  13. Gene expression profiles responses to aphid feeding in chrysanthemum (Chrysanthemum morifolium).

    Science.gov (United States)

    Xia, Xiaolong; Shao, Yafeng; Jiang, Jiafu; Ren, Liping; Chen, Fadi; Fang, Weimin; Guan, Zhiyong; Chen, Sumei

    2014-12-02

    Chrysanthemum is an important ornamental plant all over the world. It is easily attacked by aphid, Macrosiphoniella sanbourni. The molecular mechanisms of plant defense responses to aphid are only partially understood. Here, we investigate the gene expression changes in response to aphid feeding in chrysanthemum leaf by RNA-Seq technology. Three libraries were generated from pooled leaf tissues of Chrysanthemum morifolium 'nannongxunzhang' that were collected at different time points with (Y) or without (CK) aphid infestations and mock puncture treatment (Z), and sequenced using an Illumina HiSeqTM 2000 platform. A total of 7,363,292, 7,215,860 and 7,319,841 clean reads were obtained in library CK, Y and Z, respectively. The proportion of clean reads was >97.29% in each library. Approximately 76.35% of the clean reads were mapped to a reference gene database including all known chrysanthemum unigene sequences. 1,157, 527 and 340 differentially expressed genes (DEGs) were identified in the comparison of CK-VS-Y, CK-VS-Z and Z-VS-Y, respectively. These DEGs were involved in phytohormone signaling, cell wall biosynthesis, photosynthesis, reactive oxygen species (ROS) pathway and transcription factor regulatory networks, and so on. Changes in gene expression induced by aphid feeding are shown to be multifaceted. There are various forms of crosstalk between different pathways those genes belonging to, which would allow plants to fine-tune its defense responses.

  14. Simulating Metabolite Basis Sets for in vivo MRS Quantification; Incorporating details of the PRESS Pulse Sequence by means of the GAMMA C++ library

    NARCIS (Netherlands)

    Van der Veen, J.W.; Van Ormondt, D.; De Beer, R.

    2012-01-01

    In this work we report on generating/using simulated metabolite basis sets for the quantification of in vivo MRS signals, assuming that they have been acquired by using the PRESS pulse sequence. To that end we have employed the classes and functions of the GAMMA C++ library. By using several

  15. Identification of candidates for cyclotide biosynthesis and cyclisation by expressed sequence tag analysis of Oldenlandia affinis

    Directory of Open Access Journals (Sweden)

    Suda Jan

    2010-02-01

    Full Text Available Abstract Background Cyclotides are a family of circular peptides that exhibit a range of biological activities, including anti-bacterial, cytotoxic, anti-HIV activities, and are proposed to function in plant defence. Their high stability has motivated their development as scaffolds for the stabilisation of peptide drugs. Oldenlandia affinis is a member of the Rubiaceae (coffee family from which 18 cyclotides have been sequenced to date, but the details of their processing from precursor proteins have only begun to be elucidated. To increase the speed at which genes involved in cyclotide biosynthesis and processing are being discovered, an expressed sequence tag (EST project was initiated to survey the transcript profile of O. affinis and to propose some future directions of research on in vivo protein cyclisation. Results Using flow cytometry the holoploid genome size (1C-value of O. affinis was estimated to be 4,210 - 4,284 Mbp, one of the largest genomes of the Rubiaceae family. High-quality ESTs were identified, 1,117 in total, from leaf cDNAs and assembled into 502 contigs, comprising 202 consensus sequences and 300 singletons. ESTs encoding the cyclotide precursors for kalata B1 (Oak1 and kalata B2 (Oak4 were among the 20 most abundant ESTs. In total, 31 ESTs encoded cyclotide precursors, representing a distinct commitment of 2.8% of the O. affinis transcriptome to cyclotide biosynthesis. The high expression levels of cyclotide precursor transcripts are consistent with the abundance of mature cyclic peptides in O. affinis. A new cyclotide precursor named Oak5 was isolated and represents the first cDNA for the bracelet class of cyclotides in O. affinis. Clones encoding enzymes potentially involved in processing cyclotides were also identified and include enzymes involved in oxidative folding and proteolytic processing. Conclusion The EST library generated in this study provides a valuable resource for the study of the cyclisation of plant

  16. Genome-Wide Analysis of Gene and microRNA Expression in Diploid and Autotetraploid Paulownia fortunei (Seem Hemsl. under Drought Stress by Transcriptome, microRNA, and Degradome Sequencing

    Directory of Open Access Journals (Sweden)

    Zhenli Zhao

    2018-02-01

    Full Text Available Drought is a common and recurring climatic condition in many parts of the world, and it can have disastrous impacts on plant growth and development. Many genes involved in the drought response of plants have been identified. Transcriptome, microRNA (miRNA, and degradome analyses are rapid ways of identifying drought-responsive genes. The reference genome sequence of Paulownia fortunei (Seem Hemsl. is now available, which makes it easier to explore gene expression, transcriptional regulation, and post-transcriptional in this species. In this study, four transcriptome, small RNA, and degradome libraries were sequenced by Illumina sequencing, respectively. A total of 258 genes and 11 miRNAs were identified for drought-responsive genes and miRNAs in P. fortunei. Degradome sequencing detected 28 miRNA target genes that were cleaved by members of nine conserved miRNA families and 12 novel miRNAs. The results here will contribute toward enriching our understanding of the response of Paulownia fortunei trees to drought stress and may provide new direction for further experimental studies related the development of molecular markers, the genetic map construction, and other genomic research projects in Paulownia.

  17. Advanced colorectal adenoma related gene expression signature may predict prognostic for colorectal cancer patients with adenoma-carcinoma sequence.

    Science.gov (United States)

    Li, Bing; Shi, Xiao-Yu; Liao, Dai-Xiang; Cao, Bang-Rong; Luo, Cheng-Hua; Cheng, Shu-Jun

    2015-01-01

    There are still no absolute parameters predicting progression of adenoma into cancer. The present study aimed to characterize functional differences on the multistep carcinogenetic process from the adenoma-carcinoma sequence. All samples were collected and mRNA expression profiling was performed by using Agilent Microarray high-throughput gene-chip technology. Then, the characteristics of mRNA expression profiles of adenoma-carcinoma sequence were described with bioinformatics software, and we analyzed the relationship between gene expression profiles of adenoma-adenocarcinoma sequence and clinical prognosis of colorectal cancer. The mRNA expressions of adenoma-carcinoma sequence were significantly different between high-grade intraepithelial neoplasia group and adenocarcinoma group. The biological process of gene ontology function enrichment analysis on differentially expressed genes between high-grade intraepithelial neoplasia group and adenocarcinoma group showed that genes enriched in the extracellular structure organization, skeletal system development, biological adhesion and itself regulated growth regulation, with the P value after FDR correction of less than 0.05. In addition, IPR-related protein mainly focused on the insulin-like growth factor binding proteins. The variable trends of gene expression profiles for adenoma-carcinoma sequence were mainly concentrated in high-grade intraepithelial neoplasia and adenocarcinoma. The differentially expressed genes are significantly correlated between high-grade intraepithelial neoplasia group and adenocarcinoma group. Bioinformatics analysis is an effective way to study the gene expression profiles in the adenoma-carcinoma sequence, and may provide an effective tool to involve colorectal cancer research strategy into colorectal adenoma or advanced adenoma.

  18. High-throughput expression of animal venom toxins in Escherichia coli to generate a large library of oxidized disulphide-reticulated peptides for drug discovery.

    Science.gov (United States)

    Turchetto, Jeremy; Sequeira, Ana Filipa; Ramond, Laurie; Peysson, Fanny; Brás, Joana L A; Saez, Natalie J; Duhoo, Yoan; Blémont, Marilyne; Guerreiro, Catarina I P D; Quinton, Loic; De Pauw, Edwin; Gilles, Nicolas; Darbon, Hervé; Fontes, Carlos M G A; Vincentelli, Renaud

    2017-01-17

    Animal venoms are complex molecular cocktails containing a wide range of biologically active disulphide-reticulated peptides that target, with high selectivity and efficacy, a variety of membrane receptors. Disulphide-reticulated peptides have evolved to display improved specificity, low immunogenicity and to show much higher resistance to degradation than linear peptides. These properties make venom peptides attractive candidates for drug development. However, recombinant expression of reticulated peptides containing disulphide bonds is challenging, especially when associated with the production of large libraries of bioactive molecules for drug screening. To date, as an alternative to artificial synthetic chemical libraries, no comprehensive recombinant libraries of natural venom peptides are accessible for high-throughput screening to identify novel therapeutics. In the accompanying paper an efficient system for the expression and purification of oxidized disulphide-reticulated venom peptides in Escherichia coli is described. Here we report the development of a high-throughput automated platform, that could be adapted to the production of other families, to generate the largest ever library of recombinant venom peptides. The peptides were produced in the periplasm of E. coli using redox-active DsbC as a fusion tag, thus allowing the efficient formation of correctly folded disulphide bridges. TEV protease was used to remove fusion tags and recover the animal venom peptides in the native state. Globally, within nine months, out of a total of 4992 synthetic genes encoding a representative diversity of venom peptides, a library containing 2736 recombinant disulphide-reticulated peptides was generated. The data revealed that the animal venom peptides produced in the bacterial host were natively folded and, thus, are putatively biologically active. Overall this study reveals that high-throughput expression of animal venom peptides in E. coli can generate large

  19. Selection of diethylstilbestrol-specific single-chain antibodies from a non-immunized mouse ribosome display library.

    Directory of Open Access Journals (Sweden)

    Yanan Sun

    Full Text Available Single chain variable fragments (scFvs against diethylstilbestrol (DES were selected from the splenocytes of non-immunized mice by ribosome display technology. A naive library was constructed and engineered to allow in vitro transcription and translation using an E. coli lysate system. Alternating selection in solution and immobilization in microtiter wells was used to pan mRNA-ribosome-antibody (ARM complexes. After seven rounds of ribosome display, the expression vector pTIG-TRX containing the selected specific scFv DNAs were transformed into Escherichia coli BL21 (DE3 for expression. Twenty-six positive clones were screened and five clones had high antibody affinity and specificity to DES as evidenced by indirect competitive ELISA. Sequence analysis showed that these five DES-specific scFvs had different amino acid sequences, but the CDRs were highly similar. Surface plasmon resonance (SPR analysis was used to determine binding kinetics of one clone (30-1. The measured K(D was 3.79 µM. These results indicate that ribosome display technology can be used to efficiently isolate hapten-specific antibody (Ab fragments from a naive library; this study provides a methodological framework for the development of novel immunoassays for multiple environmental pollutants with low molecular weight detection using recombinant antibodies.

  20. Construction of BAC Libraries from Flow-Sorted Chromosomes.

    Science.gov (United States)

    Šafář, Jan; Šimková, Hana; Doležel, Jaroslav

    2016-01-01

    Cloned DNA libraries in bacterial artificial chromosome (BAC) are the most widely used form of large-insert DNA libraries. BAC libraries are typically represented by ordered clones derived from genomic DNA of a particular organism. In the case of large eukaryotic genomes, whole-genome libraries consist of a hundred thousand to a million clones, which make their handling and screening a daunting task. The labor and cost of working with whole-genome libraries can be greatly reduced by constructing a library derived from a smaller part of the genome. Here we describe construction of BAC libraries from mitotic chromosomes purified by flow cytometric sorting. Chromosome-specific BAC libraries facilitate positional gene cloning, physical mapping, and sequencing in complex plant genomes.

  1. Method for construction of normalized cDNA libraries

    Science.gov (United States)

    Soares, Marcelo B.; Efstratiadis, Argiris

    1998-01-01

    This invention provides a method to normalize a directional cDNA library constructed in a vector that allows propagation in single-stranded circle form comprising: (a) propagating the directional cDNA library in single-stranded circles; (b) generating fragments complementary to the 3' noncoding sequence of the single-stranded circles in the library to produce partial duplexes; (c) purifying the partial duplexes; (d) melting and reassociating the purified partial duplexes to appropriate Cot; and (e) purifying the unassociated single-stranded circles, thereby generating a normalized cDNA library. This invention also provides normalized cDNA libraries generated by the above-described method and uses of the generated libraries.

  2. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    Energy Technology Data Exchange (ETDEWEB)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric; Abernathy, Jason; Waldbieser, Geoff; Lindquist, Erika; Richardson, Paul; Lucas, Susan; Wang, Mei; Li, Ping; Thimmapuram, Jyothi; Liu, Lei; Vullaganti, Deepika; Kucuktas, Huseyin; Murdock, Christopher; Small, Brian C; Wilson, Melanie; Liu, Hong; Jiang, Yanliang; Lee, Yoona; Chen, Fei; Lu, Jianguo; Wang, Wenqi; Xu, Peng; Somridhivej, Benjaporn; Baoprasertkul, Puttharat; Quilang, Jonas; Sha, Zhenxia; Bao, Baolong; Wang, Yaping; Wang, Qun; Takano, Tomokazu; Nandi, Samiran; Liu, Shikai; Wong, Lilian; Kaltenboeck, Ludmilla; Quiniou, Sylvie; Bengten, Eva; Miller, Norman; Trant, John; Rokhsar, Daniel; Liu, Zhanjiang

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.

  3. Construction of an adult barnacle (Balanus amphitrite cDNA library and selection of reference genes for quantitative RT-PCR studies

    Directory of Open Access Journals (Sweden)

    Burgess J Grant

    2009-06-01

    Full Text Available Abstract Background Balanus amphitrite is a barnacle commonly used in biofouling research. Although many aspects of its biology have been elucidated, the lack of genetic information is impeding a molecular understanding of its life cycle. As part of a wider multidisciplinary approach to reveal the biogenic cues influencing barnacle settlement and metamorphosis, we have sequenced and annotated the first cDNA library for B. amphitrite. We also present a systematic validation of potential reference genes for normalization of quantitative real-time PCR (qRT-PCR data obtained from different developmental stages of this animal. Results We generated a cDNA library containing expressed sequence tags (ESTs from adult B. amphitrite. A total of 609 unique sequences (comprising 79 assembled clusters and 530 singlets were derived from 905 reliable unidirectionally sequenced ESTs. Bioinformatics tools such as BLAST, HMMer and InterPro were employed to allow functional annotation of the ESTs. Based on these analyses, we selected 11 genes to study their ability to normalize qRT-PCR data. Total RNA extracted from 7 developmental stages was reverse transcribed and the expression stability of the selected genes was compared using geNorm, BestKeeper and NormFinder. These software programs produced highly comparable results, with the most stable gene being mt-cyb, while tuba, tubb and cp1 were clearly unsuitable for data normalization. Conclusion The collection of B. amphitrite ESTs and their annotation has been made publically available representing an important resource for both basic and applied research on this species. We developed a qRT-PCR assay to determine the most reliable reference genes. Transcripts encoding cytochrome b and NADH dehydrogenase subunit 1 were expressed most stably, although other genes also performed well and could prove useful to normalize gene expression studies.

  4. Inexpensive multiplexed library preparation for megabase-sized genomes.

    Directory of Open Access Journals (Sweden)

    Michael Baym

    Full Text Available Whole-genome sequencing has become an indispensible tool of modern biology. However, the cost of sample preparation relative to the cost of sequencing remains high, especially for small genomes where the former is dominant. Here we present a protocol for rapid and inexpensive preparation of hundreds of multiplexed genomic libraries for Illumina sequencing. By carrying out the Nextera tagmentation reaction in small volumes, replacing costly reagents with cheaper equivalents, and omitting unnecessary steps, we achieve a cost of library preparation of $8 per sample, approximately 6 times cheaper than the standard Nextera XT protocol. Furthermore, our procedure takes less than 5 hours for 96 samples. Several hundred samples can then be pooled on the same HiSeq lane via custom barcodes. Our method will be useful for re-sequencing of microbial or viral genomes, including those from evolution experiments, genetic screens, and environmental samples, as well as for other sequencing applications including large amplicon, open chromosome, artificial chromosomes, and RNA sequencing.

  5. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments.

    Science.gov (United States)

    Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

    2004-09-22

    Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl

  6. Highly multiplexed targeted DNA sequencing from single nuclei.

    Science.gov (United States)

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  7. Analysis of Babesia bovis infection-induced gene expression changes in larvae from the cattle tick, Rhipicephalus (Boophilus microplus

    Directory of Open Access Journals (Sweden)

    Heekin Andrew M

    2012-08-01

    Full Text Available Abstract Background Cattle babesiosis is a tick-borne disease of cattle that has severe economic impact on cattle producers throughout the world’s tropical and subtropical countries. The most severe form of the disease is caused by the apicomplexan, Babesia bovis, and transmitted to cattle through the bite of infected cattle ticks of the genus Rhipicephalus, with the most prevalent species being Rhipicephalus (Boophilus microplus. We studied the reaction of the R. microplus larval transcriptome in response to infection by B. bovis. Methods Total RNA was isolated for both uninfected and Babesia bovis-infected larval samples. Subtracted libraries were prepared by subtracting the B. bovis-infected material with the uninfected material, thus enriching for expressed genes in the B. bovis-infected sample. Expressed sequence tags from the subtracted library were generated, assembled, and sequenced. To complement the subtracted library method, differential transcript expression between samples was also measured using custom high-density microarrays. The microarray probes were fabricated using oligonucleotides derived from the Bmi Gene Index database (Version 2. Array results were verified for three target genes by real-time PCR. Results Ticks were allowed to feed on a B. bovis-infected splenectomized calf and on an uninfected control calf. RNA was purified in duplicate from whole larvae and subtracted cDNA libraries were synthesized from Babesia-infected larval RNA, subtracting with the corresponding uninfected larval RNA. One thousand ESTs were sequenced from the larval library and the transcripts were annotated. We used a R. microplus microarray designed from a R. microplus gene index, BmiGI Version 2, to look for changes in gene expression that were associated with infection of R. microplus larvae. We found 24 transcripts were expressed at a statistically significant higher level in ticks feeding upon a B. bovis-infected calf contrasted to ticks

  8. Generation of thermostable Moloney murine leukemia virus reverse transcriptase variants using site saturation mutagenesis library and cell-free protein expression system.

    Science.gov (United States)

    Katano, Yuta; Li, Tongyang; Baba, Misato; Nakamura, Miyo; Ito, Masaaki; Kojima, Kenji; Takita, Teisuke; Yasukawa, Kiyoshi

    2017-12-01

    We attempted to increase the thermostability of Moloney murine leukemia virus (MMLV) reverse transcriptase (RT). The eight-site saturation mutagenesis libraries corresponding to Ala70-Arg469 in the whole MMLV RT (Thr24-Leu671), in each of which 1 out of 50 amino acid residues was replaced with other amino acid residue, were constructed. Seven-hundred and sixty eight MMLV RT clones were expressed using a cell-free protein expression system, and their thermostabilities were assessed by the temperature of thermal treatment at which they retained cDNA synthesis activity. One clone D200C was selected as the most thermostable variant. The highest temperature of thermal treatment at which D200C exhibited cDNA synthesis activity was 57ºC, which was higher than for WT (53ºC). Our results suggest that a combination of site saturation mutagenesis library and cell-free protein expression system might be useful for generation of thermostable MMLV RT in a short period of time for expression and selection.

  9. Identification and characterization of microRNAs from peanut (Arachis hypogaea L. by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Xiaoyuan Chi

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are noncoding RNAs of approximately 21 nt that regulate gene expression in plants post-transcriptionally by endonucleolytic cleavage or translational inhibition. miRNAs play essential roles in numerous developmental and physiological processes and many of them are conserved across species. Extensive studies of miRNAs have been done in a few model plants; however, less is known about the diversity of these regulatory RNAs in peanut (Arachis hypogaea L., one of the most important oilseed crops cultivated worldwide. RESULTS: A library of small RNA from peanut was constructed for deep sequencing. In addition to 126 known miRNAs from 33 families, 25 novel peanut miRNAs were identified. The miRNA* sequences of four novel miRNAs were discovered, providing additional evidence for the existence of miRNAs. Twenty of the novel miRNAs were considered to be species-specific because no homolog has been found for other plant species. qRT-PCR was used to analyze the expression of seven miRNAs in different tissues and in seed at different developmental stages and some showed tissue- and/or growth stage-specific expression. Furthermore, potential targets of these putative miRNAs were predicted on the basis of the sequence homology search. CONCLUSIONS: We have identified large numbers of miRNAs and their related target genes through deep sequencing of a small RNA library. This study of the identification and characterization of miRNAs in peanut can initiate further study on peanut miRNA regulation mechanisms, and help toward a greater understanding of the important roles of miRNAs in peanut.

  10. Identification of stress-induced genes from the drought-tolerant plant Prosopis juliflora (Swartz) DC. through analysis of expressed sequence tags.

    Science.gov (United States)

    George, Suja; Venkataraman, Gayatri; Parida, Ajay

    2007-05-01

    Abiotic stresses such as cold, salinity, drought, wounding, and heavy metal contamination adversely affect crop productivity throughout the world. Prosopis juliflora is a phreatophyte that can tolerate severe adverse environmental conditions such as drought, salinity, and heavy metal contamination. As a first step towards the characterization of genes that contribute to combating abiotic stress, construction and analysis of a cDNA library of P. juliflora genes is reported here. Random expressed sequence tag (EST) sequencing of 1750 clones produced 1467 high-quality reads. These clones were classified into functional categories, and BLAST comparisons revealed that 114 clones were homologous to genes implicated in stress response(s) and included heat shock proteins, metallothioneins, lipid transfer proteins, and late embryogenesis abundant proteins. Of the ESTs analyzed, 26% showed homology to previously uncharacterized genes in the databases. Fifty-two clones from this category were selected for reverse Northern analysis: 21 were shown to be upregulated and 16 downregulated. The results obtained by reverse Northern analysis were confirmed by Northern analysis. Clustering of the 1467 ESTs produced a total of 295 contigs encompassing 790 ESTs, resulting in a 54.2% redundancy. Two of the abundant genes coding for a nonspecific lipid transfer protein and late embryogenesis abundant protein were sequenced completely. Northern analysis (after polyethylene glycol stress) of the 2 genes was carried out. The implications of the analyzed genes in abiotic stress tolerance are also discussed.

  11. Analysis of a cDNA clone expressing a human autoimmune antigen: full-length sequence of the U2 small nuclear RNA-associated B antigen

    International Nuclear Information System (INIS)

    Habets, W.J.; Sillekens, P.T.G.; Hoet, M.H.; Schalken, J.A.; Roebroek, A.J.M.; Leunissen, J.A.M.; Van de Ven, W.J.M.; Van Venrooij, W.J.

    1987-01-01

    A U2 small nuclear RNA-associated protein, designated B'', was recently identified as the target antigen for autoimmune sera from certain patients with systemic lupus erythematosus and other rheumatic diseases. Such antibodies enabled them to isolate cDNA clone λHB''-1 from a phage λgt11 expression library. This clone appeared to code for the B'' protein as established by in vitro translation of hybrid-selected mRNA. The identity of clone λHB''-1 was further confirmed by partial peptide mapping and analysis of the reactivity of the recombinant antigen with monospecific and monoclonal antibodies. Analysis of the nucleotide sequence of the 1015-base-pair cDNA insert of clone λHB''-1 revealed a large open reading frame of 800 nucleotides containing the coding sequence for a polypeptide of 25,457 daltons. In vitro transcription of the λHB''-1 cDNA insert and subsequent translation resulted in a protein product with the molecular size of the B'' protein. These data demonstrate that clone λHB''-1 contains the complete coding sequence of this antigen. The deduced polypeptide sequence contains three very hydrophilic regions that might constitute RNA binding sites and/or antigenic determinants. These findings might have implications both for the understanding of the pathogenesis of rheumatic diseases as well as for the elucidation of the biological function of autoimmune antigens

  12. Iterative optimization of performance libraries by hierarchical division of codes

    International Nuclear Information System (INIS)

    Donadio, S.

    2007-09-01

    The increasing complexity of hardware features incorporated in modern processors makes high performance code generation very challenging. Library generators such as ATLAS, FFTW and SPIRAL overcome this issue by empirically searching in the space of possible program versions for the one that performs the best. This thesis explores fully automatic solution to adapt a compute-intensive application to the target architecture. By mimicking complex sequences of transformations useful to optimize real codes, we show that generative programming is a practical tool to implement a new hierarchical compilation approach for the generation of high performance code relying on the use of state-of-the-art compilers. As opposed to ATLAS, this approach is not application-dependant but can be applied to fairly generic loop structures. Our approach relies on the decomposition of the original loop nest into simpler kernels. These kernels are much simpler to optimize and furthermore, using such codes makes the performance trade off problem much simpler to express and to solve. Finally, we propose a new approach for the generation of performance libraries based on this decomposition method. We show that our method generates high-performance libraries, in particular for BLAS. (author)

  13. P22 Arc repressor: enhanced expression of unstable mutants by addition of polar C-terminal sequences.

    OpenAIRE

    Milla, M. E.; Brown, B. M.; Sauer, R. T.

    1993-01-01

    Many mutant variants of the P22 Arc repressor are subject to intracellular proteolysis in Escherichia coli, which precludes their expression at levels sufficient for purification and subsequent biochemical characterization. Here we examine the effects of several different C-terminal extension sequences on the expression and activity of a set of Arc mutants. We show that two tail sequences, KNQHE (st5) and H6KNQHE (st11), increase the expression levels of most mutants from 10- to 20-fold and, ...

  14. A comprehensive analysis of in vitro and in vivo genetic fitness of Pseudomonas aeruginosa using high-throughput sequencing of transposon libraries.

    Directory of Open Access Journals (Sweden)

    David Skurnik

    Full Text Available High-throughput sequencing of transposon (Tn libraries created within entire genomes identifies and quantifies the contribution of individual genes and operons to the fitness of organisms in different environments. We used insertion-sequencing (INSeq to analyze the contribution to fitness of all non-essential genes in the chromosome of Pseudomonas aeruginosa strain PA14 based on a library of ∼300,000 individual Tn insertions. In vitro growth in LB provided a baseline for comparison with the survival of the Tn insertion strains following 6 days of colonization of the murine gastrointestinal tract as well as a comparison with Tn-inserts subsequently able to systemically disseminate to the spleen following induction of neutropenia. Sequencing was performed following DNA extraction from the recovered bacteria, digestion with the MmeI restriction enzyme that hydrolyzes DNA 16 bp away from the end of the Tn insert, and fractionation into oligonucleotides of 1,200-1,500 bp that were prepared for high-throughput sequencing. Changes in frequency of Tn inserts into the P. aeruginosa genome were used to quantify in vivo fitness resulting from loss of a gene. 636 genes had <10 sequencing reads in LB, thus defined as unable to grow in this medium. During in vivo infection there were major losses of strains with Tn inserts in almost all known virulence factors, as well as respiration, energy utilization, ion pumps, nutritional genes and prophages. Many new candidates for virulence factors were also identified. There were consistent changes in the recovery of Tn inserts in genes within most operons and Tn insertions into some genes enhanced in vivo fitness. Strikingly, 90% of the non-essential genes were required for in vivo survival following systemic dissemination during neutropenia. These experiments resulted in the identification of the P. aeruginosa strain PA14 genes necessary for optimal survival in the mucosal and systemic environments of a mammalian

  15. Two distinct genes for ADP/ATP translocase are expressed at the mRNA level in adult human liver

    International Nuclear Information System (INIS)

    Houldsworth, J.; Attardi, G.

    1988-01-01

    Several clones hybridizing with a bovine ADP/ATP translocase cDNA were isolated from an adult human liver cDNA library in the vector pEX1. DNA sequence analysis revealed that these clones encode two distinct forms of translocase. In particular, two clones specifying the COOH-end-proximal five-sixths of the protein exhibit a 9% amino acid sequence divergence and totally dissimilar 3' untranslated regions. One of these cDNAs is nearly identical in sequence to an ADP/ATP translocase clone (hp2F1) recently isolated from a human fibroblast cDNA library with three amino acid changes and a few differences in the 3' untranslated region. Another clone isolated from the pEX1 library contains a reading frame encoding the remaining, NH 2 -end-proximal, 37 amino acids of the translocase. This sequence differs significantly (14% amino acid sequence divergence) from the corresponding segment of hp2F1, and the 5' untranslated regions of the two clones are totally dissimilar. RNA transfer hybridization experiments utilizing the clones isolated from the pEX1 library revealed the presence in HeLa cells of three distinct mRNA species. The pattern of hybridization and the sizes of these mRNAs suggest a greater complexity of organization and expression of the ADP/ATP translocase genes in human cells than indicated by the analysis of the cDNA clones

  16. Epitope selection from an uncensored peptide library displayed on avian leukosis virus

    International Nuclear Information System (INIS)

    Khare, Pranay D.; Rosales, Ana G.; Bailey, Kent R.; Russell, Stephen J.; Federspiel, Mark J.

    2003-01-01

    Phage display libraries have provided an extraordinarily versatile technology to facilitate the isolation of peptides, growth factors, single chain antibodies, and enzymes with desired binding specificities or enzymatic activities. The overall diversity of peptides in phage display libraries can be significantly limited by Escherichia coli protein folding and processing machinery, which result in sequence censorship. To achieve an optimal diversity of displayed eukaryotic peptides, the library should be produced in the endoplasmic reticulum of eukaryotic cells using a eukaryotic display platform. In the accompanying article, we presented experiments that demonstrate that polypeptides of various sizes could be efficiently displayed on the envelope glycoproteins of a eukaryotic virus, avian leukosis virus (ALV), and the displayed polypeptides could efficiently attach to cognate receptors without interfering with viral attachment and entry into susceptible cells. In this study, methods were developed to construct a model library of randomized eight amino acid peptides using the ALV eukaryotic display platform and screen the library for specific epitopes using immobilized antibodies. A virus library with approximately 2 x 10 6 different members was generated from a plasmid library of approximately 5 x 10 6 diversity. The sequences of the randomized 24 nucleotide/eight amino acid regions of representatives of the plasmid and virus libraries were analyzed. No significant sequence censorship was observed in producing the virus display library from the plasmid library. Different populations of peptide epitopes were selected from the virus library when different monoclonal antibodies were used as the target. The results of these two studies clearly demonstrate the potential of ALV as a eukaryotic platform for the display and selection of eukaryotic polypeptides libraries

  17. High-throughput sequencing of RNA silencing-associated small RNAs in olive (Olea europaea L..

    Directory of Open Access Journals (Sweden)

    Livia Donaire

    Full Text Available Small RNAs (sRNAs of 20 to 25 nucleotides (nt in length maintain genome integrity and control gene expression in a multitude of developmental and physiological processes. Despite RNA silencing has been primarily studied in model plants, the advent of high-throughput sequencing technologies has enabled profiling of the sRNA component of more than 40 plant species. Here, we used deep sequencing and molecular methods to report the first inventory of sRNAs in olive (Olea europaea L.. sRNA libraries prepared from juvenile and adult shoots revealed that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome. A total of 18 known miRNA families were identified in the libraries. Also, 5 other sRNAs derived from potential hairpin-like precursors remain as plausible miRNA candidates. RNA blots confirmed miRNA expression and suggested tissue- and/or developmental-specific expression patterns. Target mRNAs of conserved miRNAs were computationally predicted among the olive cDNA collection and experimentally validated through endonucleolytic cleavage assays. Finally, we use expression data to uncover genetic components of the miR156, miR172 and miR390/TAS3-derived trans-acting small interfering RNA (tasiRNA regulatory nodes, suggesting that these interactive networks controlling developmental transitions are fully operational in olive.

  18. Identification of expressed genes in cDNA library of hemocytes from the RLO-challenged oyster, Crassostrea ariakensis Gould with special functional implication of three complement-related fragments (CaC1q1, CaC1q2 and CaC3).

    Science.gov (United States)

    Xu, Ting; Xie, Jiasong; Li, Jianming; Luo, Ming; Ye, Shigen; Wu, Xinzhong

    2012-06-01

    A SMARTer™ cDNA library of hemocyte from Rickettsia-like organism (RLO) challenged oyster, Crassostrea ariakensis Gould was constructed. Random clones (400) were selected and single-pass sequenced, resulted in 200 unique sequences containing 96 known genes and 104 unknown genes. The 96 known genes were categorized into 11 groups based on their biological process. Furthermore, we identified and characterized three complement-related fragments (CaC1q1, CaC1q2 and CaC3). Tissue distribution analysis revealed that all of three fragments were ubiquitously expressed in all tissues studied including hemocyte, gills, mantle, digestive glands, gonads and adductor muscle, while the highest level was seen in the hemocyte. Temporal expression profile in the hemocyte monolayers reveled that the mRNA expression levels of three fragments presented huge increase after the RLO incubation at 3 h and 6 h in post-challenge, respectively. And the maximal expression levels at 3 h in post-challenge are about 256, 104 and 64 times higher than the values detected in the control of CaC1q1, CaC1q2 and CaC3, respectively. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Inhibition of hepatitis B virus replication with linear DNA sequences expressing antiviral micro-RNA shuttles

    Energy Technology Data Exchange (ETDEWEB)

    Chattopadhyay, Saket; Ely, Abdullah; Bloom, Kristie; Weinberg, Marc S. [Antiviral Gene Therapy Research Unit, University of the Witwatersrand (South Africa); Arbuthnot, Patrick, E-mail: Patrick.Arbuthnot@wits.ac.za [Antiviral Gene Therapy Research Unit, University of the Witwatersrand (South Africa)

    2009-11-20

    RNA interference (RNAi) may be harnessed to inhibit viral gene expression and this approach is being developed to counter chronic infection with hepatitis B virus (HBV). Compared to synthetic RNAi activators, DNA expression cassettes that generate silencing sequences have advantages of sustained efficacy and ease of propagation in plasmid DNA (pDNA). However, the large size of pDNAs and inclusion of sequences conferring antibiotic resistance and immunostimulation limit delivery efficiency and safety. To develop use of alternative DNA templates that may be applied for therapeutic gene silencing, we assessed the usefulness of PCR-generated linear expression cassettes that produce anti-HBV micro-RNA (miR) shuttles. We found that silencing of HBV markers of replication was efficient (>75%) in cell culture and in vivo. miR shuttles were processed to form anti-HBV guide strands and there was no evidence of induction of the interferon response. Modification of terminal sequences to include flanking human adenoviral type-5 inverted terminal repeats was easily achieved and did not compromise silencing efficacy. These linear DNA sequences should have utility in the development of gene silencing applications where modifications of terminal elements with elimination of potentially harmful and non-essential sequences are required.

  20. Inhibition of hepatitis B virus replication with linear DNA sequences expressing antiviral micro-RNA shuttles

    International Nuclear Information System (INIS)

    Chattopadhyay, Saket; Ely, Abdullah; Bloom, Kristie; Weinberg, Marc S.; Arbuthnot, Patrick

    2009-01-01

    RNA interference (RNAi) may be harnessed to inhibit viral gene expression and this approach is being developed to counter chronic infection with hepatitis B virus (HBV). Compared to synthetic RNAi activators, DNA expression cassettes that generate silencing sequences have advantages of sustained efficacy and ease of propagation in plasmid DNA (pDNA). However, the large size of pDNAs and inclusion of sequences conferring antibiotic resistance and immunostimulation limit delivery efficiency and safety. To develop use of alternative DNA templates that may be applied for therapeutic gene silencing, we assessed the usefulness of PCR-generated linear expression cassettes that produce anti-HBV micro-RNA (miR) shuttles. We found that silencing of HBV markers of replication was efficient (>75%) in cell culture and in vivo. miR shuttles were processed to form anti-HBV guide strands and there was no evidence of induction of the interferon response. Modification of terminal sequences to include flanking human adenoviral type-5 inverted terminal repeats was easily achieved and did not compromise silencing efficacy. These linear DNA sequences should have utility in the development of gene silencing applications where modifications of terminal elements with elimination of potentially harmful and non-essential sequences are required.

  1. Molecular cloning, sequence characterization and expression pattern of Rab18 gene from watermelon (Citrullus lanatus).

    Science.gov (United States)

    Xinli, Xiao; Lei, Peng

    2015-03-04

    The complete mRNA sequence of watermelon Rab18 gene was amplified through the rapid amplification of cDNA ends (RACE) method. The full-length mRNA was 1010 bp containing a 645 bp open reading frame, which encodes a protein of 214 amino acids. Sequence analysis revealed that watermelon Rab18 protein shares high homology with the Rab18 of cucumber (99%), muskmelon (98%), Morus notabilis (90%), tomato (89%), wine grape (89%) and potato (88%). Phylogenetic analysis revealed that watermelon Rab18 gene has a closer genetic relationship with Rab18 gene of cucumber and muskmelon. Tissue expression profile analysis indicated that watermelon Rab18 gene was highly expressed in root, stem and leaf, moderately expressed in flower and weakly expressed in fruit.

  2. Presence and Expression of Microbial Genes Regulating Soil Nitrogen Dynamics Along the Tanana River Successional Sequence

    Science.gov (United States)

    Boone, R. D.; Rogers, S. L.

    2004-12-01

    We report on work to assess the functional gene sequences for soil microbiota that control nitrogen cycle pathways along the successional sequence (willow, alder, poplar, white spruce, black spruce) on the Tanana River floodplain, Interior Alaska. Microbial DNA and mRNA were extracted from soils (0-10 cm depth) for amoA (ammonium monooxygenase), nifH (nitrogenase reductase), napA (nitrate reductase), and nirS and nirK (nitrite reductase) genes. Gene presence was determined by amplification of a conserved sequence of each gene employing sequence specific oligonucleotide primers and Polymerase Chain Reaction (PCR). Expression of the genes was measured via nested reverse transcriptase PCR amplification of the extracted mRNA. Amplified PCR products were visualized on agarose electrophoresis gels. All five successional stages show evidence for the presence and expression of microbial genes that regulate N fixation (free-living), nitrification, and nitrate reduction. We detected (1) nifH, napA, and nirK presence and amoA expression (mRNA production) for all five successional stages and (2) nirS and amoA presence and nifH, nirK, and napA expression for early successional stages (willow, alder, poplar). The results highlight that the existing body of previous process-level work has not sufficiently considered the microbial potential for a nitrate economy and free-living N fixation along the complete floodplain successional sequence.

  3. K-shuff: A Novel Algorithm for Characterizing Structural and Compositional Diversity in Gene Libraries.

    Science.gov (United States)

    Jangid, Kamlesh; Kao, Ming-Hung; Lahamge, Aishwarya; Williams, Mark A; Rathbun, Stephen L; Whitman, William B

    2016-01-01

    K-shuff is a new algorithm for comparing the similarity of gene sequence libraries, providing measures of the structural and compositional diversity as well as the significance of the differences between these measures. Inspired by Ripley's K-function for spatial point pattern analysis, the Intra K-function or IKF measures the structural diversity, including both the richness and overall similarity of the sequences, within a library. The Cross K-function or CKF measures the compositional diversity between gene libraries, reflecting both the number of OTUs shared as well as the overall similarity in OTUs. A Monte Carlo testing procedure then enables statistical evaluation of both the structural and compositional diversity between gene libraries. For 16S rRNA gene libraries from complex bacterial communities such as those found in seawater, salt marsh sediments, and soils, K-shuff yields reproducible estimates of structural and compositional diversity with libraries greater than 50 sequences. Similarly, for pyrosequencing libraries generated from a glacial retreat chronosequence and Illumina® libraries generated from US homes, K-shuff required >300 and 100 sequences per sample, respectively. Power analyses demonstrated that K-shuff is sensitive to small differences in Sanger or Illumina® libraries. This extra sensitivity of K-shuff enabled examination of compositional differences at much deeper taxonomic levels, such as within abundant OTUs. This is especially useful when comparing communities that are compositionally very similar but functionally different. K-shuff will therefore prove beneficial for conventional microbiome analysis as well as specific hypothesis testing.

  4. ESPRIT: A Method for Defining Soluble Expression Constructs in Poorly Understood Gene Sequences.

    Science.gov (United States)

    Mas, Philippe J; Hart, Darren J

    2017-01-01

    Production of soluble, purifiable domains or multi-domain fragments of proteins is a prerequisite for structural biology and other applications. When target sequences are poorly annotated, or when there are few similar sequences available for alignments, identification of domains can be problematic. A method called expression of soluble proteins by random incremental truncation (ESPRIT) addresses this problem by high-throughput automated screening of tens of thousands of enzymatically truncated gene fragments. Rare soluble constructs are identified by experimental screening, and the boundaries revealed by DNA sequencing.

  5. Differentially expressed genes of Coptotermes formosanus (Isoptera: Rhinotermitidae) challenged by chemical insecticides.

    Science.gov (United States)

    Zhang, Yi; Zhao, Yuanyuan; Qiu, Xuehong; Han, Richou

    2013-08-01

    Coptotermes formosanus Shiraki (Isoptera: Rhinotermitidae) termites are harmful social insects to wood constructions. The current control methods heavily depend on the chemical insecticides with increasing resistance. Analysis of the differentially expressed genes mediated by chemical insecticides will contribute to the understanding of the termite resistance to chemicals and to the establishment of alternative control measures. In the present article, a full-length cDNA library was constructed from the termites induced by a mixture of commonly used insecticides (0.01% sulfluramid and 0.01% triflumuron) for 24 h, by using the RNA ligase-mediated Rapid Amplification cDNA End method. Fifty-eight differentially expressed clones were obtained by polymerase chain reaction and confirmed by dot-blot hybridization. Forty-six known sequences were obtained, which clustered into 33 unique sequences grouped in 6 contigs and 27 singlets. Sixty-seven percent (22) of the sequences had counterpart genes from other organisms, whereas 33% (11) were undescribed. A Gene Ontology analysis classified 33 unique sequences into different functional categories. In general, most of the differential expression genes were involved in binding and catalytic activity.

  6. Identification of microRNA-Like RNAs in the filamentous fungus Trichoderma reesei by solexa sequencing.

    Directory of Open Access Journals (Sweden)

    Kang Kang

    Full Text Available microRNAs (miRNAs are non-coding small RNAs (sRNAs capable of negatively regulating gene expression. Recently, microRNA-like small RNAs (milRNAs were discovered in several filamentous fungi but not yet in Trichoderma reesei, an industrial filamentous fungus that can secrete abundant hydrolases. To explore the presence of milRNA in T. reesei and evaluate their expression under induction of cellulose, two T. reesei sRNA libraries of cellulose induction (IN and non-induction (CON were generated and sequenced using Solexa sequencing technology. A total of 726 and 631 sRNAs were obtained from the IN and CON samples, respectively. Global expression analysis showed an extensively differential expression of sRNAs in T. reesei under the two conditions. Thirteen predicted milRNAs were identified in T. reesei based on the short hairpin structure analysis. The milRNA profiles obtained in deep sequencing were further validated by RT-qPCR assay. Computational analysis predicted a number of potential targets relating to many processes including regulation of enzyme expression. The presence and differential expression of T. reesei milRNAs imply that milRNA might play a role in T. reesei growth and cellulase induction. This work lays foundation for further functional study of fungal milRNAs and their industrial application.

  7. Sequence and expression analyses of porcine ISG15 and ISG43 genes.

    Science.gov (United States)

    Huang, Jiangnan; Zhao, Shuhong; Zhu, Mengjin; Wu, Zhenfang; Yu, Mei

    2009-08-01

    The coding sequences of porcine interferon-stimulated gene 15 (ISG15) and the interferon-stimulated gene (ISG43) were cloned from swine spleen mRNA. The amino acid sequences deduced from porcine ISG15 and ISG43 genes coding sequence shared 24-75% and 29-83% similarity with ISG15s and ISG43s from other vertebrates, respectively. Structural analyses revealed that porcine ISG15 comprises two ubiquitin homologues motifs (UBQ) domain and a conserved C-terminal LRLRGG conjugating motif. Porcine ISG43 contains an ubiquitin-processing proteases-like domain. Phylogenetic analyses showed that porcine ISG15 and ISG43 were mostly related to rat ISG15 and cattle ISG43, respectively. Using quantitative real-time PCR assay, significant increased expression levels of porcine ISG15 and ISG43 genes were detected in porcine kidney endothelial cells (PK15) cells treated with poly I:C. We also observed the enhanced mRNA expression of three members of dsRNA pattern-recognition receptors (PRR), TLR3, DDX58 and IFIH1, which have been reported to act as critical receptors in inducing the mRNA expression of ISG15 and ISG43 genes. However, we did not detect any induced mRNA expression of IFNalpha and IFNbeta, suggesting that transcriptional activations of ISG15 and ISG43 were mediated through IFN-independent signaling pathway in the poly I:C treated PK15 cells. Association analyses in a Landrace pig population revealed that ISG15 c.347T>C (BstUI) polymorphism and the ISG43 c.953T>G (BccI) polymorphism were significantly associated with hematological parameters and immune-related traits.

  8. DNA-encoded chemical libraries: advancing beyond conventional small-molecule libraries.

    Science.gov (United States)

    Franzini, Raphael M; Neri, Dario; Scheuermann, Jörg

    2014-04-15

    DNA-encoded chemical libraries (DECLs) represent a promising tool in drug discovery. DECL technology allows the synthesis and screening of chemical libraries of unprecedented size at moderate costs. In analogy to phage-display technology, where large antibody libraries are displayed on the surface of filamentous phage and are genetically encoded in the phage genome, DECLs feature the display of individual small organic chemical moieties on DNA fragments serving as amplifiable identification barcodes. The DNA-tag facilitates the synthesis and allows the simultaneous screening of very large sets of compounds (up to billions of molecules), because the hit compounds can easily be identified and quantified by PCR-amplification of the DNA-barcode followed by high-throughput DNA sequencing. Several approaches have been used to generate DECLs, differing both in the methods used for library encoding and for the combinatorial assembly of chemical moieties. For example, DECLs can be used for fragment-based drug discovery, displaying a single molecule on DNA or two chemical moieties at the extremities of complementary DNA strands. DECLs can vary substantially in the chemical structures and the library size. While ultralarge libraries containing billions of compounds have been reported containing four or more sets of building blocks, also smaller libraries have been shown to be efficient for ligand discovery. In general, it has been found that the overall library size is a poor predictor for library performance and that the number and diversity of the building blocks are rather important indicators. Smaller libraries consisting of two to three sets of building blocks better fulfill the criteria of drug-likeness and often have higher quality. In this Account, we present advances in the DECL field from proof-of-principle studies to practical applications for drug discovery, both in industry and in academia. DECL technology can yield specific binders to a variety of target

  9. MicroRNA discovery and analysis of pinewood nematode Bursaphelenchus xylophilus by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Qi-Xing Huang

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are considered to be very important in regulating the growth, development, behavior and stress response in animals and plants in post-transcriptional gene regulation. Pinewood nematode, Bursaphelenchus xylophilus, is an important invasive plant parasitic nematode in Asia. To have a comprehensive knowledge about miRNAs of the nematode is necessary for further in-depth study on roles of miRNAs in the ecological adaptation of the invasive species. METHODS AND FINDINGS: Five small RNA libraries were constructed and sequenced by Illumina/Solexa deep-sequencing technology. A total of 810 miRNA candidates (49 conserved and 761 novel were predicted by a computational pipeline, of which 57 miRNAs (20 conserved and 37 novel encoded by 53 miRNA precursors were identified by experimental methods. Ten novel miRNAs were considered to be species-specific miRNAs of B. xylophilus. Comparison of expression profiles of miRNAs in the five small RNA libraries showed that many miRNAs exhibited obviously different expression levels in the third-stage dispersal juvenile and at a cold-stressed status. Most of the miRNAs exhibited obviously down-regulated expression in the dispersal stage. But differences among the three geographic libraries were not prominent. A total of 979 genes were predicted to be targets of these authentic miRNAs. Among them, seven heat shock protein genes were targeted by 14 miRNAs, and six FMRFamide-like neuropeptides genes were targeted by 17 miRNAs. A real-time quantitative polymerase chain reaction was used to quantify the mRNA expression levels of target genes. CONCLUSIONS: Basing on the fact that a negative correlation existed between the expression profiles of miRNAs and the mRNA expression profiles of their target genes (hsp, flp by comparing those of the nematodes at a cold stressed status and a normal status, we suggested that miRNAs might participate in ecological adaptation and behavior regulation of the

  10. GenEST, a powerful bidirectional link between cDNA sequence data and gene expression profiles generated by cDNA-AFLP

    NARCIS (Netherlands)

    Qin Ling,; Prins, P.; Jones, J.T.; Popeijus, H.; Smant, G.; Bakker, J.; Helder, J.

    2001-01-01

    The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power

  11. Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

    Directory of Open Access Journals (Sweden)

    Rachel Caldwell

    2015-01-01

    Full Text Available There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

  12. [Sequences and expression pattern of mce gene in Leptospira interrogans of different serogroups].

    Science.gov (United States)

    Zhang, Lei; Xue, Feng; Yan, Jie; Mao, Ya-fei; Li, Li-wei

    2008-11-01

    To determine the frequency of mce gene in Leptospira interrogans, and to investigate the gene transcription levels of L. interrogans before and after infecting cells. The segments of entire mce genes from 13 L.interrogans strains and 1 L.biflexa strain were amplified by PCR and then sequenced after T-A cloning. A prokaryotic expression system of mce gene was constructed; the expression and output of the target recombinant protein rMce were examined by SDS-PAGE and Western Blot assay. Rabbits were intradermally immunized with rMce to prepare the antiserum, the titer of antiserum was measured by immunodiffusion test. The transcription levels of mce gene in L.interrogans serogroup Icterohaemorrhagiae serovar lai strain 56601 before and after infecting J774A.1 cells were monitored by real-time fluorescence quantitative RT-PCR. mce gene was carried in all tested L.interrogans strains, but not in L.biflexa serogroup Semaranga serovar patoc strain Patoc I. The similarities of nucleotide and putative amino acid sequences of the cloned mce genes to the reported sequences (GenBank accession No: NP712236) were 99.02%-100% and 97.91%-100%, respectively. The constructed prokaryotic expression system of mce gene expressed rMce and the output of rMce was about 5% of the total bacterial proteins. The antiserum against whole cell of L.interrogans strain 56601 efficiently recognized rMce. After infecting J774A.1 cells, transcription levels of the mce gene in L.interrogans strain 56601 were remarkably up-regulated. The constructed prokaryotic expression system of mce gene and the prepared antiserum against rMce provide useful tools for further study of the gene function.

  13. Cloning and sequencing of a gene encoding a 21-kilodalton outer membrane protein from Bordetella avium and expression of the gene in Salmonella typhimurium.

    Science.gov (United States)

    Gentry-Weeks, C R; Hultsch, A L; Kelly, S M; Keith, J M; Curtiss, R

    1992-01-01

    Three gene libraries of Bordetella avium 197 DNA were prepared in Escherichia coli LE392 by using the cosmid vectors pCP13 and pYA2329, a derivative of pCP13 specifying spectinomycin resistance. The cosmid libraries were screened with convalescent-phase anti-B. avium turkey sera and polyclonal rabbit antisera against B. avium 197 outer membrane proteins. One E. coli recombinant clone produced a 56-kDa protein which reacted with convalescent-phase serum from a turkey infected with B. avium 197. In addition, five E. coli recombinant clones were identified which produced B. avium outer membrane proteins with molecular masses of 21, 38, 40, 43, and 48 kDa. At least one of these E. coli clones, which encoded the 21-kDa protein, reacted with both convalescent-phase turkey sera and antibody against B. avium 197 outer membrane proteins. The gene for the 21-kDa outer membrane protein was localized by Tn5seq1 mutagenesis, and the nucleotide sequence was determined by dideoxy sequencing. DNA sequence analysis of the 21-kDa protein revealed an open reading frame of 582 bases that resulted in a predicted protein of 194 amino acids. Comparison of the predicted amino acid sequence of the gene encoding the 21-kDa outer membrane protein with protein sequences in the National Biomedical Research Foundation protein sequence data base indicated significant homology to the OmpA proteins of Shigella dysenteriae, Enterobacter aerogenes, E. coli, and Salmonella typhimurium and to Neisseria gonorrhoeae outer membrane protein III, Haemophilus influenzae protein P6, and Pseudomonas aeruginosa porin protein F. The gene (ompA) encoding the B. avium 21-kDa protein hybridized with 4.1-kb DNA fragments from EcoRI-digested, chromosomal DNA of Bordetella pertussis and Bordetella bronchiseptica and with 6.0- and 3.2-kb DNA fragments from EcoRI-digested, chromosomal DNA of B. avium and B. avium-like DNA, respectively. A 6.75-kb DNA fragment encoding the B. avium 21-kDa protein was subcloned into the

  14. VH gene expression and regulation in the mutant Alicia rabbit. Rescue of VHa2 allotype expression.

    Science.gov (United States)

    Chen, H T; Alexander, C B; Young-Cooper, G O; Mage, R G

    1993-04-01

    Rabbits of the Alicia strain, derived from rabbits expressing the VHa2 allotype, have a mutation in the H chain locus that has a cis effect upon the expression of VHa2 and VHa- genes. A small deletion at the most J-proximal (3') end of the VH locus leads to low expression of all the genes on the entire chromosome in heterozygous ali mutants and altered relative expression of VH genes in homozygotes. To study VH gene expression and regulation, we used the polymerase chain reaction to amplify the VH genes expressed in spleens of young and adult wild-type and mutant Alicia rabbits. The cDNA from reverse transcription of splenic mRNA was amplified and polymerase chain reaction libraries were constructed and screened with oligonucleotides from framework regions 1 and 3, as well as JH. Thirty-three VH-positive clones were sequenced and analyzed. We found that in mutant Alicia rabbits, products of the first functional VH gene (VH4a2), (or VH4a2-like genes) were expressed in 2- to 8-wk-olds. Expression of both the VHx and VHy types of VHa- genes was also elevated but the relative proportions of VHx and VHy, especially VHx, decreased whereas the relative levels of expression of VH4a2 or VH4a2-like genes increased with age. Our results suggest that the appearance of sequences resembling that of the VH1a2, which is deleted in the mutant ali rabbits, could be caused by alterations of the sequences of the rearranged VH4a2 genes by gene conversions and/or rearrangement of upstream VH1a2-like genes later in development.

  15. [Cloning and characterization of genes differentially expressed in human dental pulp cells and gingival fibroblasts].

    Science.gov (United States)

    Wang, Zhong-dong; Wu, Ji-nan; Zhou, Lin; Ling, Jun-qi; Guo, Xi-min; Xiao, Ming-zhen; Zhu, Feng; Pu, Qin; Chai, Yu-bo; Zhao, Zhong-liang

    2007-02-01

    To study the biological properties of human dental pulp cells (HDPC) by cloning and analysis of genes differentially expressed in HDPC in comparison with human gingival fibroblasts (HGF). HDPC and HGF were cultured and identified by immunocytochemistry. HPDC and HGF subtractive cDNA library was established by PCR-based modified subtractive hybridization, genes differentially expressed by HPDC were cloned, sequenced and compared to find homogeneous sequence in GenBank by BLAST. Cloning and sequencing analysis indicate 12 genes differentially expressed were obtained, in which two were unknown genes. Among the 10 known genes, 4 were related to signal transduction, 2 were related to trans-membrane transportation (both cell membrane and nuclear membrane), and 2 were related to RNA splicing mechanisms. The biological properties of HPDC are determined by the differential expression of some genes and the growth and differentiation of HPDC are associated to the dynamic protein synthesis and secretion activities of the cell.

  16. Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

    Directory of Open Access Journals (Sweden)

    White Frank F

    2011-07-01

    Full Text Available Abstract Background Eight diverse sorghum (Sorghum bicolor L. Moench accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs. Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.

  17. MicroRNA repertoire for functional genome research in tilapia identified by deep sequencing.

    Science.gov (United States)

    Yan, Biao; Wang, Zhen-Hua; Zhu, Chang-Dong; Guo, Jin-Tao; Zhao, Jin-Liang

    2014-08-01

    The Nile tilapia (Oreochromis niloticus; Cichlidae) is an economically important species in aquaculture and occupies a prominent position in the aquaculture industry. MicroRNAs (miRNAs) are a class of noncoding RNAs that post-transcriptionally regulate gene expression involved in diverse biological and metabolic processes. To increase the repertoire of miRNAs characterized in tilapia, we used the Illumina/Solexa sequencing technology to sequence a small RNA library using pooled RNA sample isolated from the different developmental stages of tilapia. Bioinformatic analyses suggest that 197 conserved and 27 novel miRNAs are expressed in tilapia. Sequence alignments indicate that all tested miRNAs and miRNAs* are highly conserved across many species. In addition, we characterized the tissue expression patterns of five miRNAs using real-time quantitative PCR. We found that miR-1/206, miR-7/9, and miR-122 is abundantly expressed in muscle, brain, and liver, respectively, implying a potential role in the regulation of tissue differentiation or the maintenance of tissue identity. Overall, our results expand the number of tilapia miRNAs, and the discovery of miRNAs in tilapia genome contributes to a better understanding the role of miRNAs in regulating diverse biological processes.

  18. The role of heterologous chloroplast sequence elements in transgene integration and expression.

    Science.gov (United States)

    Ruhlman, Tracey; Verma, Dheeraj; Samson, Nalapalli; Daniell, Henry

    2010-04-01

    Heterologous regulatory elements and flanking sequences have been used in chloroplast transformation of several crop species, but their roles and mechanisms have not yet been investigated. Nucleotide sequence identity in the photosystem II protein D1 (psbA) upstream region is 59% across all taxa; similar variation was consistent across all genes and taxa examined. Secondary structure and predicted Gibbs free energy values of the psbA 5' untranslated region (UTR) among different families reflected this variation. Therefore, chloroplast transformation vectors were made for tobacco (Nicotiana tabacum) and lettuce (Lactuca sativa), with endogenous (Nt-Nt, Ls-Ls) or heterologous (Nt-Ls, Ls-Nt) psbA promoter, 5' UTR and 3' UTR, regulating expression of the anthrax protective antigen (PA) or human proinsulin (Pins) fused with the cholera toxin B-subunit (CTB). Unique lettuce flanking sequences were completely eliminated during homologous recombination in the transplastomic tobacco genomes but not unique tobacco sequences. Nt-Ls or Ls-Nt transplastomic lines showed reduction of 80% PA and 97% CTB-Pins expression when compared with endogenous psbA regulatory elements, which accumulated up to 29.6% total soluble protein PA and 72.0% total leaf protein CTB-Pins, 2-fold higher than Rubisco. Transgene transcripts were reduced by 84% in Ls-Nt-CTB-Pins and by 72% in Nt-Ls-PA lines. Transcripts containing endogenous 5' UTR were stabilized in nonpolysomal fractions. Stromal RNA-binding proteins were preferentially associated with endogenous psbA 5' UTR. A rapid and reproducible regeneration system was developed for lettuce commercial cultivars by optimizing plant growth regulators. These findings underscore the need for sequencing complete crop chloroplast genomes, utilization of endogenous regulatory elements and flanking sequences, as well as optimization of plant growth regulators for efficient chloroplast transformation.

  19. New data libraries for transmutation studies

    Energy Technology Data Exchange (ETDEWEB)

    Kloosterman, J.L. [Netherlands Energy Research Foundation (ECN), Petten (Netherlands); Hoogenboom, J.E. [Interfaculty Reactor Inst., Delft (Netherlands)

    1995-06-01

    The fuel depletion code ORIGEN-S is often used for transmutation studies. It uses three different working libraries for actinides, fission products, and light elements, which contain decay data, cross-section data and fission product yields. These data have been renewed with data based on the JEF2.2 and the EAF3 evaluated files. Furthermore, data for 201 fission products have been added to the libraries. The new data libraries are particular suitable for parameter studies and other introductory calculations. For more accurate calculations, it is advised to regularly update the cross sections of the most important actinides and fission products during the burnup sequence. (orig.).

  20. New data libraries for transmutation studies

    Energy Technology Data Exchange (ETDEWEB)

    Kloosterman, J.L. [Netherlands Energy Research Foundation (ECN), Petten (Netherlands); Hoogenboom, J.E. [Interfaculty Reactor Inst., Delft (Netherlands)

    1995-12-31

    The fuel depletion code ORIGEN-S is often used for transmutations studies. It uses three different working libraries for actinides, fission products, and light elements, which contain decay data, cross-section data and fission product yields. These data have renewed with data based on the JEF2.2 and the EAF3 evaluated files. Furthermore, data for 201 fission products have been added to the libraries. The new data libraries are particular suitable for parameter studies and other introductory calculations. For more accurate calculations, it is advised to regularly update the cross sections of the most important actinides and fission products during the burnup sequence. (author) 9 refs.

  1. Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

    Science.gov (United States)

    Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

    2009-01-01

    Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining c

  2. Characterization of full-length sequenced cDNA inserts (FLIcs from Atlantic salmon (Salmo salar

    Directory of Open Access Journals (Sweden)

    Lunner Sigbjørn

    2009-10-01

    Full Text Available Abstract Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP, the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91% of the transcripts were annotated using Gene Ontology (GO terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS. The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS. This

  3. The organization and expression of the mdm2 gene

    Energy Technology Data Exchange (ETDEWEB)

    Montes De Oca Luna, R.; Tabor, A.D.; Eberspaecher, H. [Univ. of Texas, Houston, TX (United States)] [and others

    1996-05-01

    The mdm2 gene encodes a zinc finger protein that negatively regulates p53 function by binding and masking the p53 transcriptional activation domain. Two different promoters control expression of mdm2, one of which is also transactivated by p53. We cloned and characterized the mdm2 gene from a murine 129 library. It contained at least 12 exons and spanned approximately 25 kb of DNA. Sequencing of the mdm2 gene revealed three nucleotide differences that resulted in amino acid substitutions in the previously published mdm2 sequence. Sequences of normal BalbC/J DNA and the original cosmid clone is isolated from the 3T3DM cell line revealed that they are identical, suggesting that the published sequence is in error at these three positions. In addition, we analyzed the expression pattern of mdm2 and found ubiquitous low-level expression throughout embryo development and in adult tissues. Analysis of mRNA from numerous tissues for several mdm2 spliced variants that had been identified in the transformed 3T3DM cell line revealed that these variants could not be detected in the developing embryo or in adult tissues. 25 refs., 3 figs., 2 tabs.

  4. Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs.

    Science.gov (United States)

    Chávez Montes, Ricardo A; de Fátima Rosas-Cárdenas, Flor; De Paoli, Emanuele; Accerbi, Monica; Rymarquis, Linda A; Mahalingam, Gayathri; Marsch-Martínez, Nayelli; Meyers, Blake C; Green, Pamela J; de Folter, Stefan

    2014-04-23

    Small RNAs are pivotal regulators of gene expression that guide transcriptional and post-transcriptional silencing mechanisms in eukaryotes, including plants. Here we report a comprehensive atlas of sRNA and miRNA from 3 species of algae and 31 representative species across vascular plants, including non-model plants. We sequence and quantify sRNAs from 99 different tissues or treatments across species, resulting in a data set of over 132 million distinct sequences. Using miRBase mature sequences as a reference, we identify the miRNA sequences present in these libraries. We apply diverse profiling methods to examine critical sRNA and miRNA features, such as size distribution, tissue-specific regulation and sequence conservation between species, as well as to predict putative new miRNA sequences. We also develop database resources, computational analysis tools and a dedicated website, http://smallrna.udel.edu/. This study provides new insights on plant sRNAs and miRNAs, and a foundation for future studies.

  5. Determination of a Screening Metric for High Diversity DNA Libraries.

    Science.gov (United States)

    Guido, Nicholas J; Handerson, Steven; Joseph, Elaine M; Leake, Devin; Kung, Li A

    2016-01-01

    The fields of antibody engineering, enzyme optimization and pathway construction rely increasingly on screening complex variant DNA libraries. These highly diverse libraries allow researchers to sample a maximized sequence space; and therefore, more rapidly identify proteins with significantly improved activity. The current state of the art in synthetic biology allows for libraries with billions of variants, pushing the limits of researchers' ability to qualify libraries for screening by measuring the traditional quality metrics of fidelity and diversity of variants. Instead, when screening variant libraries, researchers typically use a generic, and often insufficient, oversampling rate based on a common rule-of-thumb. We have developed methods to calculate a library-specific oversampling metric, based on fidelity, diversity, and representation of variants, which informs researchers, prior to screening the library, of the amount of oversampling required to ensure that the desired fraction of variant molecules will be sampled. To derive this oversampling metric, we developed a novel alignment tool to efficiently measure frequency counts of individual nucleotide variant positions using next-generation sequencing data. Next, we apply a method based on the "coupon collector" probability theory to construct a curve of upper bound estimates of the sampling size required for any desired variant coverage. The calculated oversampling metric will guide researchers to maximize their efficiency in using highly variant libraries.

  6. Determination of a Screening Metric for High Diversity DNA Libraries.

    Directory of Open Access Journals (Sweden)

    Nicholas J Guido

    Full Text Available The fields of antibody engineering, enzyme optimization and pathway construction rely increasingly on screening complex variant DNA libraries. These highly diverse libraries allow researchers to sample a maximized sequence space; and therefore, more rapidly identify proteins with significantly improved activity. The current state of the art in synthetic biology allows for libraries with billions of variants, pushing the limits of researchers' ability to qualify libraries for screening by measuring the traditional quality metrics of fidelity and diversity of variants. Instead, when screening variant libraries, researchers typically use a generic, and often insufficient, oversampling rate based on a common rule-of-thumb. We have developed methods to calculate a library-specific oversampling metric, based on fidelity, diversity, and representation of variants, which informs researchers, prior to screening the library, of the amount of oversampling required to ensure that the desired fraction of variant molecules will be sampled. To derive this oversampling metric, we developed a novel alignment tool to efficiently measure frequency counts of individual nucleotide variant positions using next-generation sequencing data. Next, we apply a method based on the "coupon collector" probability theory to construct a curve of upper bound estimates of the sampling size required for any desired variant coverage. The calculated oversampling metric will guide researchers to maximize their efficiency in using highly variant libraries.

  7. Differential expression profiles in the midgut of Triatoma infestans infected with Trypanosoma cruzi.

    Directory of Open Access Journals (Sweden)

    Diego S Buarque

    Full Text Available Chagas disease, or American trypanosomiasis, is a parasitic disease caused by the protozoan Trypanosoma cruzi and is transmitted by insects from the Triatominae subfamily. To identify components involved in the protozoan-vector relationship, we constructed and analyzed cDNA libraries from RNA isolated from the midguts of uninfected and T. cruzi-infected Triatoma infestans, which are major vectors of Chagas disease. We generated approximately 440 high-quality Expressed Sequence Tags (ESTs from each T. infestans midgut cDNA library. The sequences were grouped in 380 clusters, representing an average length of 664.78 base pairs (bp. Many clusters were not classified functionally, representing unknown transcripts. Several transcripts involved in different processes (e.g., detoxification showed differential expression in response to T. cruzi infection. Lysozyme, cathepsin D, a nitrophorin-like protein and a putative 14 kDa protein were significantly upregulated upon infection, whereas thioredoxin reductase was downregulated. In addition, we identified several transcripts related to metabolic processes or immunity with unchanged expressions, including infestin, lipocalins and defensins. We also detected ESTs encoding juvenile hormone binding protein (JHBP, which seems to be involved in insect development and could be a target in control strategies for the vector. This work demonstrates differential gene expression upon T. cruzi infection in the midgut of T. infestans. These data expand the current knowledge regarding vector-parasite interactions for Chagas disease.

  8. Identification of microRNAs differentially expressed involved in male flower development.

    Science.gov (United States)

    Wang, Zhengjia; Huang, Jianqin; Sun, Zhichao; Zheng, Bingsong

    2015-03-01

    Hickory (Carya cathayensis Sarg.) is one of the most economically important woody trees in eastern China, but its long flowering phase delays yield. Our understanding of the regulatory roles of microRNAs (miRNAs) in male flower development in hickory remains poor. Using high-throughput sequencing technology, we have pyrosequenced two small RNA libraries from two male flower differentiation stages in hickory. Analysis of the sequencing data identified 114 conserved miRNAs that belonged to 23 miRNA families, five novel miRNAs including their corresponding miRNA*s, and 22 plausible miRNA candidates. Differential expression analysis revealed 12 miRNA sequences that were upregulated in the later (reproductive) stage of male flower development. Quantitative real-time PCR showed similar expression trends as that of the deep sequencing. Novel miRNAs and plausible miRNA candidates were predicted using bioinformatic analysis methods. The miRNAs newly identified in this study have increased the number of known miRNAs in hickory, and the identification of differentially expressed miRNAs will provide new avenues for studies into miRNAs involved in the process of male flower development in hickory and other related trees.

  9. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  10. Abiotic Stress-Related Expressed Sequence Tags from the Diploid Strawberry Fragaria vesca f. semperflorens

    Directory of Open Access Journals (Sweden)

    Maximo. Rivarola

    2011-03-01

    Full Text Available Strawberry ( spp. is a eudicotyledonous plant that belongs to the Rosaceae family, which includes other agronomically important plants such as raspberry ( L. and several tree-fruit species. Despite the vital role played by cultivated strawberry in agriculture, few stress-related gene expression characterizations of this crop are available. To increase the diversity of available transcriptome sequence, we produced 41,430 L. expressed sequence tags (ESTs from plants growing under water-, temperature-, and osmotic-stress conditions as well as a combination of heat and osmotic stresses that is often found in irrigated fields. Clustering and assembling of the ESTs resulted in a total of 11,836 contigs and singletons that were annotated using Gene Ontology (GO terms. Furthermore, over 1200 sequences with no match to available Rosaceae ESTs were found, including six that were assigned the “response to stress” GO category. Analysis of EST frequency provided an estimate of steady state transcript levels, with 91 sequences exhibiting at least a 20-fold difference between treatments. This EST collection represents a useful resource to advance our understanding of the abiotic stress-response mechanisms in strawberry. The sequence information may be translated to valuable tree crops in the Rosaceae family, where whole-plant treatments are not as simple or practical.

  11. Development of expressed sequence tag-simple sequence repeat markers for genetic characterization and population structure analysis of Praxelis clematidea (Asteraceae).

    Science.gov (United States)

    Wang, Q Z; Huang, M; Downie, S R; Chen, Z X

    2016-05-23

    Invasive plants tend to spread aggressively in new habitats and an understanding of their genetic diversity and population structure is useful for their management. In this study, expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed for the invasive plant species Praxelis clematidea (Asteraceae) from 5548 Stevia rebaudiana (Asteraceae) expressed sequence tags (ESTs). A total of 133 microsatellite-containing ESTs (2.4%) were identified, of which 56 (42.1%) were hexanucleotide repeat motifs and 50 (37.6%) were trinucleotide repeat motifs. Of the 24 primer pairs designed from these 133 ESTs, 7 (29.2%) resulted in significant polymorphisms. The number of alleles per locus ranged from 5 to 9. The relatively high genetic diversity (H = 0.2667, I = 0.4212, and P = 100%) of P. clematidea was related to high gene flow (Nm = 1.4996) among populations. The coefficient of population differentiation (GST = 0.2500) indicated that most genetic variation occurred within populations. A Mantel test suggested that there was significant correlation between genetic distance and geographical distribution (r = 0.3192, P = 0.012). These results further support the transferability of EST-SSR markers between closely related genera of the same family.

  12. Bovine mammary gene expression profiling during the onset of lactation.

    Directory of Open Access Journals (Sweden)

    Yuanyuan Gao

    Full Text Available BACKGROUND: Lactogenesis includes two stages. Stage I begins a few weeks before parturition. Stage II is initiated around the time of parturition and extends for several days afterwards. METHODOLOGY/PRINCIPAL FINDINGS: To better understand the molecular events underlying these changes, genome-wide gene expression profiling was conducted using digital gene expression (DGE on bovine mammary tissue at three time points (on approximately day 35 before parturition (-35 d, day 7 before parturition (-7 d and day 3 after parturition (+3 d. Approximately 6.2 million (M, 5.8 million (M and 6.1 million (M 21-nt cDNA tags were sequenced in the three cDNA libraries (-35 d, -7 d and +3 d, respectively. After aligning to the reference sequences, the three cDNA libraries included 8,662, 8,363 and 8,359 genes, respectively. With a fold change cutoff criteria of ≥ 2 or ≤-2 and a false discovery rate (FDR of ≤ 0.001, a total of 812 genes were significantly differentially expressed at -7 d compared with -35 d (stage I. Gene ontology analysis showed that those significantly differentially expressed genes were mainly associated with cell cycle, lipid metabolism, immune response and biological adhesion. A total of 1,189 genes were significantly differentially expressed at +3 d compared with -7 d (stage II, and these genes were mainly associated with the immune response and cell cycle. Moreover, there were 1,672 genes significantly differentially expressed at +3 d compared with -35 d. Gene ontology analysis showed that the main differentially expressed genes were those associated with metabolic processes. CONCLUSIONS: The results suggest that the mammary gland begins to lactate not only by a gain of function but also by a broad suppression of function to effectively push most of the cell's resources towards lactation.

  13. In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

    Science.gov (United States)

    Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

    2011-01-01

    To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533

  14. An expressed sequence tag (EST) data mining strategy succeeding in the discovery of new G-protein coupled receptors.

    Science.gov (United States)

    Wittenberger, T; Schaller, H C; Hellebrand, S

    2001-03-30

    We have developed a comprehensive expressed sequence tag database search method and used it for the identification of new members of the G-protein coupled receptor superfamily. Our approach proved to be especially useful for the detection of expressed sequence tag sequences that do not encode conserved parts of a protein, making it an ideal tool for the identification of members of divergent protein families or of protein parts without conserved domain structures in the expressed sequence tag database. At least 14 of the expressed sequence tags found with this strategy are promising candidates for new putative G-protein coupled receptors. Here, we describe the sequence and expression analysis of five new members of this receptor superfamily, namely GPR84, GPR86, GPR87, GPR90 and GPR91. We also studied the genomic structure and chromosomal localization of the respective genes applying in silico methods. A cluster of six closely related G-protein coupled receptors was found on the human chromosome 3q24-3q25. It consists of four orphan receptors (GPR86, GPR87, GPR91, and H963), the purinergic receptor P2Y1, and the uridine 5'-diphosphoglucose receptor KIAA0001. It seems likely that these receptors evolved from a common ancestor and therefore might have related ligands. In conclusion, we describe a data mining procedure that proved to be useful for the identification and first characterization of new genes and is well applicable for other gene families. Copyright 2001 Academic Press.

  15. Validation of SCALE-4 criticality sequences using ENDF/B-V data

    International Nuclear Information System (INIS)

    Bowman, S.M.; Wright, R.Q.; DeHart, M.D.; Taniuchi, H.

    1993-01-01

    The SCALE code system developed at Oak Ridge National Laboratory contains criticality safety analysis sequences that include the KENO V.a Monte Carlo code for calculation of the effective multiplication factor. These sequences are widely used for criticality safety analyses performed both in the United States and abroad. The purpose of the current work is to validate the SCALE-4 criticality sequences with an ENDF/B-V cross-section library for future distribution with SCALE-4. The library used for this validation is a broad-group library (44 groups) collapsed from the 238-group SCALE library. Extensive data testing of both the 238-group and the 44-group libraries included 10 fast and 18 thermal CSEWG benchmarks and 5 other fast benchmarks. Both libraries contain approximately 300 nuclides and are, therefore, capable of modeling most systems, including those containing spent fuel or radioactive waste. The validation of the broad-group library used 93 critical experiments as benchmarks. The range of experiments included 60 light-water-reactor fuel rod lattices, 13 mixed-oxide fuel rod lattice, and 15 other low- and high-enriched uranium critical assemblies

  16. Cloning and sequencing of cDNA encoding human DNA topoisomerase II and localization of the gene to chromosome region 17q21-22

    International Nuclear Information System (INIS)

    Tsai-Pflugfelder, M.; Liu, L.F.; Liu, A.A.; Tewey, K.M.; Whang-Peng, J.; Knutsen, T.; Huebner, K.; Croce, C.M.; Wang, J.C.

    1988-01-01

    Two overlapping cDNA clones encoding human DNA topoisomerase II were identified by two independent methods. In one, a human cDNA library in phage λ was screened by hybridization with a mixed oligonucleotide probe encoding a stretch of seven amino acids found in yeast and Drosophila DNA topoisomerase II; in the other, a different human cDNA library in a λgt11 expression vector was screened for the expression of antigenic determinants that are recognized by rabbit antibodies specific to human DNA topoisomerase II. The entire coding sequences of the human DNA topoisomerase II gene were determined from these and several additional clones, identified through the use of the cloned human TOP2 gene sequences as probes. Hybridization between the cloned sequences and mRNA and genomic DNA indicates that the human enzyme is encoded by a single-copy gene. The location of the gene was mapped to chromosome 17q21-22 by in situ hybridization of a cloned fragment to metaphase chromosomes and by hybridization analysis with a panel of mouse-human hybrid cell lines, each retaining a subset of human chromosomes

  17. Methods for the preparation of large quantities of complex single-stranded oligonucleotide libraries.

    Science.gov (United States)

    Murgha, Yusuf E; Rouillard, Jean-Marie; Gulari, Erdogan

    2014-01-01

    Custom-defined oligonucleotide collections have a broad range of applications in fields of synthetic biology, targeted sequencing, and cytogenetics. Also, they are used to encode information for technologies like RNA interference, protein engineering and DNA-encoded libraries. High-throughput parallel DNA synthesis technologies developed for the manufacture of DNA microarrays can produce libraries of large numbers of different oligonucleotides, but in very limited amounts. Here, we compare three approaches to prepare large quantities of single-stranded oligonucleotide libraries derived from microarray synthesized collections. The first approach, alkaline melting of double-stranded PCR amplified libraries with a biotinylated strand captured on streptavidin coated magnetic beads results in little or no non-biotinylated ssDNA. The second method wherein the phosphorylated strand of PCR amplified libraries is nucleolyticaly hydrolyzed is recommended when small amounts of libraries are needed. The third method combining in vitro transcription of PCR amplified libraries to reverse transcription of the RNA product into single-stranded cDNA is our recommended method to produce large amounts of oligonucleotide libraries. Finally, we propose a method to remove any primer binding sequences introduced during library amplification.

  18. microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.

    Science.gov (United States)

    Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong

    2012-01-01

    microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.

  19. Microaspiration of esophageal gland cells and cDNA library construction for identifying parasitism genes of plant-parasitic nematodes.

    Science.gov (United States)

    Hussey, Richard S; Huang, Guozhong; Allen, Rex

    2011-01-01

    Identifying parasitism genes encoding proteins secreted from a plant-parasitic nematode's esophageal gland cells and injected through its stylet into plant tissue is the key to understanding the molecular basis of nematode parasitism of plants. Parasitism genes have been cloned by directly microaspirating the cytoplasm from the esophageal gland cells of different parasitic stages of cyst or root-knot nematodes to provide mRNA to create a gland cell-specific cDNA library by long-distance reverse-transcriptase polymerase chain reaction. cDNA clones are sequenced and deduced protein sequences with a signal peptide for secretion are identified for high-throughput in situ hybridization to confirm gland-specific expression.

  20. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Science.gov (United States)

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  1. Screening of Genes Specifically Expressed in Males of Fenneropenaeus chinensis and Their Potential as Sex Markers

    Directory of Open Access Journals (Sweden)

    Shihao Li

    2013-01-01

    Full Text Available The androgenic gland (AG, playing an important role in sex differentiation of male crustacean, is a target candidate to understand the mechanism of male development and to mine male-specific sex markers. An SSH library (designated as male reproduction-related tissues—SSH library, MRT-SSH library for short was constructed using cDNA from tissues located at the basal part of the 5th pereiopods, including AG and part of spermatophore sac, as tester, and the cDNA from the basal part of the 4th pereiopods of these male shrimp as driver. 402 ESTs from the SSH library were sequenced and assembled into 48 contigs and 104 singlets. Twelve contigs and 14 singlets were identified as known genes. The proteins encoded by the identified genes were categorized, according to their proposed functions, into neuropeptide hormone and hormone transporter, RNA posttranscriptional regulation, translation, cell growth and death, metabolism, genetic information processing, signal transduction/transport, or immunity-related proteins. Eleven highly expressed contigs in the SSH library were selected for validation of the MRT-SSH library and screening sex markers of shrimp. One contig, specifically expressed in male shrimp, had a potential to be developed as a transcriptomic sex marker in shrimp.

  2. Molecular characterization, tissue expression and sequence variability of the barramundi (Lates calcarifer myostatin gene

    Directory of Open Access Journals (Sweden)

    Smith-Keune Carolyn

    2008-02-01

    Full Text Available Abstract Background Myostatin (MSTN is a member of the transforming growth factor-β superfamily that negatively regulates growth of skeletal muscle tissue. The gene encoding for the MSTN peptide is a consolidate candidate for the enhancement of productivity in terrestrial livestock. This gene potentially represents an important target for growth improvement of cultured finfish. Results Here we report molecular characterization, tissue expression and sequence variability of the barramundi (Lates calcarifer MSTN-1 gene. The barramundi MSTN-1 was encoded by three exons 379, 371 and 381 bp in length and translated into a 376-amino acid peptide. Intron 1 and 2 were 412 and 819 bp in length and presented typical GT...AG splicing sites. The upstream region contained cis-regulatory elements such as TATA-box and E-boxes. A first assessment of sequence variability suggested that higher mutation rates are found in the 5' flanking region with several SNP's present in this species. A putative micro RNA target site has also been observed in the 3'UTR (untranslated region and is highly conserved across teleost fish. The deduced amino acid sequence was conserved across vertebrates and exhibited characteristic conserved putative functional residues including a cleavage motif of proteolysis (RXXR, nine cysteines and two glycosilation sites. A qualitative analysis of the barramundi MSTN-1 expression pattern revealed that, in adult fish, transcripts are differentially expressed in various tissues other than skeletal muscles including gill, heart, kidney, intestine, liver, spleen, eye, gonad and brain. Conclusion Our findings provide valuable insights such as sequence variation and genomic information which will aid the further investigation of the barramundi MSTN-1 gene in association with growth. The finding for the first time in finfish MSTN of a miRNA target site in the 3'UTR provides an opportunity for the identification of regulatory mutations on the

  3. A novel approach to sequence validating protein expression clones with automated decision making

    Directory of Open Access Journals (Sweden)

    Mohr Stephanie E

    2007-06-01

    Full Text Available Abstract Background Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, an arduous and time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be "acceptable" for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation. Results We have developed an Automated Clone Evaluation (ACE system – the first comprehensive, multi-platform, web-based plasmid sequence verification software package. ACE automates the clone verification process by defining each clone sequence as a list of multidimensional discrepancy objects, each describing a difference between the clone and its expected sequence including the resulting polypeptide consequences. To evaluate clones automatically, this list can be compared against user acceptance criteria that specify the allowable number of discrepancies of each type. This strategy allows users to re-evaluate the same set of clones against different acceptance criteria as needed for use in other experiments. ACE manages the entire sequence validation process including contig management, identifying and annotating discrepancies, determining if discrepancies correspond to polymorphisms and clone finishing. Designed to manage thousands of clones simultaneously, ACE maintains a relational database to store information about clones at various completion stages, project processing parameters and acceptance criteria. In a direct comparison, the automated analysis by ACE took less time and was more accurate than a manual analysis of a 93 gene clone set. Conclusion ACE was designed to facilitate high throughput clone sequence

  4. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  5. Genomic analysis of expressed sequence tags in American black bear Ursus americanus

    Science.gov (United States)

    2010-01-01

    Background Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Results Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. Conclusion We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes. PMID:20338065

  6. Genomic analysis of expressed sequence tags in American black bear Ursus americanus.

    Science.gov (United States)

    Zhao, Sen; Shao, Chunxuan; Goropashnaya, Anna V; Stewart, Nathan C; Xu, Yichi; Tøien, Øivind; Barnes, Brian M; Fedorov, Vadim B; Yan, Jun

    2010-03-26

    Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes.

  7. DNApi: A De Novo Adapter Prediction Algorithm for Small RNA Sequencing Data.

    Science.gov (United States)

    Tsuji, Junko; Weng, Zhiping

    2016-01-01

    With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were "ready-to-map" small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies.

  8. Assessing high affinity binding to HLA-DQ2.5 by a novel peptide library based approach

    DEFF Research Database (Denmark)

    Jüse, Ulrike; Arntzen, Magnus; Højrup, Peter

    2011-01-01

    Here we report on a novel peptide library based method for HLA class II binding motif identification. The approach is based on water soluble HLA class II molecules and soluble dedicated peptide libraries. A high number of different synthetic peptides are competing to interact with a limited amount...... library. The eluted sequences fit very well with the previously described HLA-DQ2.5 peptide binding motif. This novel method, limited by library complexity and sensitivity of mass spectrometry, allows the analysis of several thousand synthetic sequences concomitantly in a simple water soluble format....

  9. G-stack modulated probe intensities on expression arrays - sequence corrections and signal calibration

    Directory of Open Access Journals (Sweden)

    Fasold Mario

    2010-04-01

    Full Text Available Abstract Background The brightness of the probe spots on expression microarrays intends to measure the abundance of specific mRNA targets. Probes with runs of at least three guanines (G in their sequence show abnormal high intensities which reflect rather probe effects than target concentrations. This G-bias requires correction prior to downstream expression analysis. Results Longer runs of three or more consecutive G along the probe sequence and in particular triple degenerated G at its solution end ((GGG1-effect are associated with exceptionally large probe intensities on GeneChip expression arrays. This intensity bias is related to non-specific hybridization and affects both perfect match and mismatch probes. The (GGG1-effect tends to increase gradually for microarrays of later GeneChip generations. It was found for DNA/RNA as well as for DNA/DNA probe/target-hybridization chemistries. Amplification of sample RNA using T7-primers is associated with strong positive amplitudes of the G-bias whereas alternative amplification protocols using random primers give rise to much smaller and partly even negative amplitudes. We applied positional dependent sensitivity models to analyze the specifics of probe intensities in the context of all possible short sequence motifs of one to four adjacent nucleotides along the 25meric probe sequence. Most of the longer motifs are adequately described using a nearest-neighbor (NN model. In contrast, runs of degenerated guanines require explicit consideration of next nearest neighbors (GGG terms. Preprocessing methods such as vsn, RMA, dChip, MAS5 and gcRMA only insufficiently remove the G-bias from data. Conclusions Positional and motif dependent sensitivity models accounts for sequence effects of oligonucleotide probe intensities. We propose a positional dependent NN+GGG hybrid model to correct the intensity bias associated with probes containing poly-G motifs. It is implemented as a single-chip based calibration

  10. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

    KAUST Repository

    Coll, Francesc

    2015-05-27

    Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data complexity has restricted their clinical application. A library (1,325 mutations) predictive of DR for 15 anti-tuberculosis drugs was compiled and validated for 11 of them using genomic-phenotypic data from 792 strains. A rapid online ‘TB-Profiler’ tool was developed to report DR and strain-type profiles directly from raw sequences. Using our DR mutation library, in silico diagnostic accuracy was superior to some commercial diagnostics and alternative databases. The library will facilitate sequence-based drug-susceptibility testing.

  11. Development of a large peptoid-DOTA combinatorial library.

    Science.gov (United States)

    Singh, Jaspal; Lopes, Daniel; Gomika Udugamasooriya, D

    2016-09-01

    Conventional one-bead one-compound (OBOC) library synthesis is typically used to identify molecules with therapeutic value. The design and synthesis of OBOC libraries that contain molecules with imaging or even potentially therapeutic and diagnostic capacities (e.g. theranostic agents) has been overlooked. The development of a therapeutically active molecule with a built-in imaging component for a certain target is a daunting task, and structure-based rational design might not be the best approach. We hypothesize to develop a combinatorial library with potentially therapeutic and imaging components fused together in each molecule. Such molecules in the library can be used to screen, identify, and validate as direct theranostic candidates against targets of interest. As the first step in achieving that aim, we developed an on-bead library of 153,600 Peptoid-DOTA compounds in which the peptoids are the target-recognizing and potentially therapeutic components and the DOTA is the imaging component. We attached the DOTA scaffold to TentaGel beads using one of the four arms of DOTA, and we built a diversified 6-mer peptoid library on the remaining three arms. We evaluated both the synthesis and the mass spectrometric sequencing capacities of the test compounds and of the final library. The compounds displayed unique ionization patterns including direct breakages of the DOTA scaffold into two units, allowing clear decoding of the sequences. Our approach provides a facile synthesis method for the complete on-bead development of large peptidomimetic-DOTA libraries for screening against biological targets for the identification of potential theranostic agents in the future. © 2016 The Authors. Biopolymers Published by Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 673-684, 2016. © 2016 The Authors. Biopolymers Published by Wiley Periodicals, Inc.

  12. Identification and functional analysis of a new glyphosate resistance gene from a fungus cDNA library.

    Science.gov (United States)

    Tao, Bo; Shao, Bai-Hui; Qiao, Yu-Xin; Wang, Xiao-Qin; Chang, Shu-Jun; Qiu, Li-Juan

    2017-08-01

    Glyphosate is a widely used broad spectrum herbicide; however, this limits its use once crops are planted. If glyphosate-resistant crops are grown, glyphosate can be used for weed control in crops. While several glyphosate resistance genes are used in commercial glyphosate tolerant crops, there is interest in identifying additional genes for glyphosate tolerance. This research constructed a high-quality cDNA library form the glyphosate-resistant fungus Aspergillus oryzae RIB40 to identify genes that may confer resistance to glyphosate. Using a medium containing glyphosate (120mM), we screened several clones from the library. Based on a nucleotide sequence analysis, we identified a gene of unknown function (GenBank accession number: XM_001826835.2) that encoded a hypothetical 344-amino acid protein. The gene was named MFS40. Its ORF was amplified to construct an expression vector, pGEX-4T-1-MFS40, to express the protein in Escherichia coli BL21. The gene conferred glyphosate tolerance to E. coli ER2799 cells. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Extending Immunological Profiling in the Gilthead Sea Bream, Sparus aurata, by Enriched cDNA Library Analysis, Microarray Design and Initial Studies upon the Inflammatory Response to PAMPs

    Directory of Open Access Journals (Sweden)

    Sebastian Boltaña

    2017-02-01

    Full Text Available This study describes the development and validation of an enriched oligonucleotide-microarray platform for Sparus aurata (SAQ to provide a platform for transcriptomic studies in this species. A transcriptome database was constructed by assembly of gilthead sea bream sequences derived from public repositories of mRNA together with reads from a large collection of expressed sequence tags (EST from two extensive targeted cDNA libraries characterizing mRNA transcripts regulated by both bacterial and viral challenge. The developed microarray was further validated by analysing monocyte/macrophage activation profiles after challenge with two Gram-negative bacterial pathogen-associated molecular patterns (PAMPs; lipopolysaccharide (LPS and peptidoglycan (PGN. Of the approximately 10,000 EST sequenced, we obtained a total of 6837 EST longer than 100 nt, with 3778 and 3059 EST obtained from the bacterial-primed and from the viral-primed cDNA libraries, respectively. Functional classification of contigs from the bacterial- and viral-primed cDNA libraries by Gene Ontology (GO showed that the top five represented categories were equally represented in the two libraries: metabolism (approximately 24% of the total number of contigs, carrier proteins/membrane transport (approximately 15%, effectors/modulators and cell communication (approximately 11%, nucleoside, nucleotide and nucleic acid metabolism (approximately 7.5% and intracellular transducers/signal transduction (approximately 5%. Transcriptome analyses using this enriched oligonucleotide platform identified differential shifts in the response to PGN and LPS in macrophage-like cells, highlighting responsive gene-cassettes tightly related to PAMP host recognition. As observed in other fish species, PGN is a powerful activator of the inflammatory response in S. aurata macrophage-like cells. We have developed and validated an oligonucleotide microarray (SAQ that provides a platform enriched for the study

  14. Extending Immunological Profiling in the Gilthead Sea Bream, Sparus aurata, by Enriched cDNA Library Analysis, Microarray Design and Initial Studies upon the Inflammatory Response to PAMPs.

    Science.gov (United States)

    Boltaña, Sebastian; Castellana, Barbara; Goetz, Giles; Tort, Lluis; Teles, Mariana; Mulero, Victor; Novoa, Beatriz; Figueras, Antonio; Goetz, Frederick W; Gallardo-Escarate, Cristian; Planas, Josep V; Mackenzie, Simon

    2017-02-03

    This study describes the development and validation of an enriched oligonucleotide-microarray platform for Sparus aurata (SAQ) to provide a platform for transcriptomic studies in this species. A transcriptome database was constructed by assembly of gilthead sea bream sequences derived from public repositories of mRNA together with reads from a large collection of expressed sequence tags (EST) from two extensive targeted cDNA libraries characterizing mRNA transcripts regulated by both bacterial and viral challenge. The developed microarray was further validated by analysing monocyte/macrophage activation profiles after challenge with two Gram-negative bacterial pathogen-associated molecular patterns (PAMPs; lipopolysaccharide (LPS) and peptidoglycan (PGN)). Of the approximately 10,000 EST sequenced, we obtained a total of 6837 EST longer than 100 nt, with 3778 and 3059 EST obtained from the bacterial-primed and from the viral-primed cDNA libraries, respectively. Functional classification of contigs from the bacterial- and viral-primed cDNA libraries by Gene Ontology (GO) showed that the top five represented categories were equally represented in the two libraries: metabolism (approximately 24% of the total number of contigs), carrier proteins/membrane transport (approximately 15%), effectors/modulators and cell communication (approximately 11%), nucleoside, nucleotide and nucleic acid metabolism (approximately 7.5%) and intracellular transducers/signal transduction (approximately 5%). Transcriptome analyses using this enriched oligonucleotide platform identified differential shifts in the response to PGN and LPS in macrophage-like cells, highlighting responsive gene-cassettes tightly related to PAMP host recognition. As observed in other fish species, PGN is a powerful activator of the inflammatory response in S. aurata macrophage-like cells. We have developed and validated an oligonucleotide microarray (SAQ) that provides a platform enriched for the study of gene

  15. An optimum analysis sequence for environmental gamma-ray spectrometry

    International Nuclear Information System (INIS)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  16. Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels

    NARCIS (Netherlands)

    Deelen, Patrick; Zhernakova, Daria V.; de Haan, Mark; van der Sijde, Marijke; Bonder, Marc Jan; Karjalainen, Juha; van der Velde, K. Joeri; Abbott, Kristin M.; Fu, Jingyuan; Wijmenga, Cisca; Sinke, Richard J.; Swertz, Morris A.; Franke, Lude

    2015-01-01

    Background: RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq

  17. High-throughput sequencing of small RNA transcriptome reveals salt stress regulated microRNAs in sugarcane.

    Directory of Open Access Journals (Sweden)

    Mariana Carnavale Bottino

    Full Text Available Salt stress is a primary cause of crop losses worldwide, and it has been the subject of intense investigation to unravel the complex mechanisms responsible for salinity tolerance. MicroRNA is implicated in many developmental processes and in responses to various abiotic stresses, playing pivotal roles in plant adaptation. Deep sequencing technology was chosen to determine the small RNA transcriptome of Saccharum sp cultivars grown on saline conditions. We constructed four small RNAs libraries prepared from plants grown on hydroponic culture submitted to 170 mM NaCl and harvested after 1 h, 6 hs and 24 hs. Each library was sequenced individually and together generated more than 50 million short reads. Ninety-eight conserved miRNAs and 33 miRNAs* were identified by bioinformatics. Several of the microRNA showed considerable differences of expression in the four libraries. To confirm the results of the bioinformatics-based analysis, we studied the expression of the 10 most abundant miRNAs and 1 miRNA* in plants treated with 170 mM NaCl and in plants with a severe treatment of 340 mM NaCl. The results showed that 11 selected miRNAs had higher expression in samples treated with severe salt treatment compared to the mild one. We also investigated the regulation of the same miRNAs in shoots of four cultivars grown on soil treated with 170 mM NaCl. Cultivars could be grouped according to miRNAs expression in response to salt stress. Furthermore, the majority of the predicted target genes had an inverse regulation with their correspondent microRNAs. The targets encode a wide range of proteins, including transcription factors, metabolic enzymes and genes involved in hormone signaling, probably assisting the plants to develop tolerance to salinity. Our work provides insights into the regulatory functions of miRNAs, thereby expanding our knowledge on potential salt-stressed regulated genes.

  18. The carnegie protein trap library: a versatile tool for Drosophila developmental studies.

    Science.gov (United States)

    Buszczak, Michael; Paterno, Shelley; Lighthouse, Daniel; Bachman, Julia; Planck, Jamie; Owen, Stephenie; Skora, Andrew D; Nystul, Todd G; Ohlstein, Benjamin; Allen, Anna; Wilhelm, James E; Murphy, Terence D; Levis, Robert W; Matunis, Erika; Srivali, Nahathai; Hoskins, Roger A; Spradling, Allan C

    2007-03-01

    Metazoan physiology depends on intricate patterns of gene expression that remain poorly known. Using transposon mutagenesis in Drosophila, we constructed a library of 7404 protein trap and enhancer trap lines, the Carnegie collection, to facilitate gene expression mapping at single-cell resolution. By sequencing the genomic insertion sites, determining splicing patterns downstream of the enhanced green fluorescent protein (EGFP) exon, and analyzing expression patterns in the ovary and salivary gland, we found that 600-900 different genes are trapped in our collection. A core set of 244 lines trapped different identifiable protein isoforms, while insertions likely to act as GFP-enhancer traps were found in 256 additional genes. At least 8 novel genes were also identified. Our results demonstrate that the Carnegie collection will be useful as a discovery tool in diverse areas of cell and developmental biology and suggest new strategies for greatly increasing the coverage of the Drosophila proteome with protein trap insertions.

  19. Journal of Genetics | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    The female spinach genome was taken as blocker and cDNA library specifically expressed in Y chromosome was constructed. Moreover, expressed sequence tag (EST) sequences in cDNA library were cloned, sequenced and bioinformatics was analysed. There were 63 valid EST sequences obtained in this study.

  20. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

    Science.gov (United States)

    Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

    2009-07-01

    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.

  1. A Utilitarian Case for Intellectual Freedom in Libraries.

    Science.gov (United States)

    Doyle, Tony

    2001-01-01

    Outlines the history of censorship and intellectual and expressive freedom in American libraries; discusses the two main types of ethical theory, utilitarianism and deontology; and maintains that libraries have a special role to play in promoting unconditional intellectual freedom. (Author/LRW)

  2. A Blumeria graminis f.sp. hordei BAC library - contig building and microsynteny studies

    DEFF Research Database (Denmark)

    Pedersen, C.; Wu, B.; Giese, H.

    2002-01-01

    A bacterial artificial chromosome (BAC) library of Blumeria graminis f.sp. hordei, containing 12,000 clones with an average insert size of 41 kb, was constructed. The library represents about three genome equivalents and BAC-end sequencing showed a high content of repetitive sequences, making...... contigs, at or close to avirulence loci, were constructed. Single nucleotide polymorphism (SNP) markers were developed from BAC-end sequences to link the contigs to the genetic maps. Two other BAC contigs were used to study microsynteny between B. graminis and two other ascomycetes, Neurospora crassa...

  3. Single-tube library preparation for degraded DNA

    DEFF Research Database (Denmark)

    Carøe, Christian; Gopalakrishnan, Shyam; Vinner, Lasse

    2018-01-01

    these obstacles and enable higher throughput are therefore of interest to researchers working with degraded DNA. 2.In this study, we compare four Illumina library preparation protocols, including two “single-tube” methods developed for this study with the explicit aim of improving data quality and reducing...... of chemically damaged and highly fragmented DNA molecules. In particular, the enzymatic reactions and DNA purification steps during library preparation can result in DNA template loss and sequencing biases, affecting downstream analyses. The development of library preparation methods that circumvent...... preparation time and expenses. The methods are tested on grey wolf (Canis lupus) museum specimens. 3.We found single-tube protocols increase library complexity, yield more reads that map uniquely to the reference genome, reduce processing time, and may decrease laboratory costs by 90%. 4.Given the advantages...

  4. Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.).

    Science.gov (United States)

    Cloutier, Sylvie; Miranda, Evelyn; Ward, Kerry; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Datla, Raju; Rowland, Gordon; Duguid, Scott; Ragupathy, Raja

    2012-08-01

    Flax is an important oilseed crop in North America and is mostly grown as a fibre crop in Europe. As a self-pollinated diploid with a small estimated genome size of ~370 Mb, flax is well suited for fast progress in genomics. In the last few years, important genetic resources have been developed for this crop. Here, we describe the assessment and comparative analyses of 1,506 putative simple sequence repeats (SSRs) of which, 1,164 were derived from BAC-end sequences (BESs) and 342 from expressed sequence tags (ESTs). The SSRs were assessed on a panel of 16 flax accessions with 673 (58 %) and 145 (42 %) primer pairs being polymorphic in the BESs and ESTs, respectively. With 818 novel polymorphic SSR primer pairs reported in this study, the repertoire of available SSRs in flax has more than doubled from the combined total of 508 of all previous reports. Among nucleotide motifs, trinucleotides were the most abundant irrespective of the class, but dinucleotides were the most polymorphic. SSR length was also positively correlated with polymorphism. Two dinucleotide (AT/TA and AG/GA) and two trinucleotide (AAT/ATA/TAA and GAA/AGA/AAG) motifs and their iterations, different from those reported in many other crops, accounted for more than half of all the SSRs and were also more polymorphic (63.4 %) than the rest of the markers (42.7 %). This improved resource promises to be useful in genetic, quantitative trait loci (QTL) and association mapping as well as for anchoring the physical/genetic map with the whole genome shotgun reference sequence of flax.

  5. Large-scale DNA Barcode Library Generation for Biomolecule Identification in High-throughput Screens.

    Science.gov (United States)

    Lyons, Eli; Sheridan, Paul; Tremmel, Georg; Miyano, Satoru; Sugano, Sumio

    2017-10-24

    High-throughput screens allow for the identification of specific biomolecules with characteristics of interest. In barcoded screens, DNA barcodes are linked to target biomolecules in a manner allowing for the target molecules making up a library to be identified by sequencing the DNA barcodes using Next Generation Sequencing. To be useful in experimental settings, the DNA barcodes in a library must satisfy certain constraints related to GC content, homopolymer length, Hamming distance, and blacklisted subsequences. Here we report a novel framework to quickly generate large-scale libraries of DNA barcodes for use in high-throughput screens. We show that our framework dramatically reduces the computation time required to generate large-scale DNA barcode libraries, compared with a naїve approach to DNA barcode library generation. As a proof of concept, we demonstrate that our framework is able to generate a library consisting of one million DNA barcodes for use in a fragment antibody phage display screening experiment. We also report generating a general purpose one billion DNA barcode library, the largest such library yet reported in literature. Our results demonstrate the value of our novel large-scale DNA barcode library generation framework for use in high-throughput screening applications.

  6. Patron Behavior Policies in the Public Library: "Kreimer v. Morristown" Revisited.

    Science.gov (United States)

    Geiszler, Robert W.

    1998-01-01

    The case of an indigent library patron recovering a judgment against a public library is used as a backdrop for discussing patron behavior policies in the public library. Highlights include First Amendment rights, the public library as an expressive forum, government rules, policy lessons from the case, and acceptable policies. (AEF)

  7. Can abundance of protists be inferred from sequence data: a case study of foraminifera.

    Directory of Open Access Journals (Sweden)

    Alexandra A-T Weber

    Full Text Available Protists are key players in microbial communities, yet our understanding of their role in ecosystem functioning is seriously impeded by difficulties in identification of protistan species and their quantification. Current microscopy-based methods used for determining the abundance of protists are tedious and often show a low taxonomic resolution. Recent development of next-generation sequencing technologies offered a very powerful tool for studying the richness of protistan communities. Still, the relationship between abundance of species and number of sequences remains subjected to various technical and biological biases. Here, we test the impact of some of these biological biases on sequence abundance of SSU rRNA gene in foraminifera. First, we quantified the rDNA copy number and rRNA expression level of three species of foraminifera by qPCR. Then, we prepared five mock communities with these species, two in equal proportions and three with one species ten times more abundant. The libraries of rDNA and cDNA of the mock communities were constructed, Sanger sequenced and the sequence abundance was calculated. The initial species proportions were compared to the raw sequence proportions as well as to the sequence abundance normalized by rDNA copy number and rRNA expression level per species. Our results showed that without normalization, all sequence data differed significantly from the initial proportions. After normalization, the congruence between the number of sequences and number of specimens was much better. We conclude that without normalization, species abundance determination based on sequence data was not possible because of the effect of biological biases. Nevertheless, by taking into account the variation of rDNA copy number and rRNA expression level we were able to infer species abundance, suggesting that our approach can be successful in controlled conditions.

  8. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

    Directory of Open Access Journals (Sweden)

    Morgan Kullberg

    Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.

  9. Cloning and sequencing of the gene for human β-casein

    International Nuclear Information System (INIS)

    Loennerdal, B.; Bergstroem, S.; Andersson, Y.; Hialmarsson, K.; Sundgyist, A.; Hernell, O.

    1990-01-01

    Human β-casein is a major protein in human milk. This protein is part of the casein micelle and has been suggested to have several physiological functions in the newborn. Since there is limited information on βcasein and the factors that affect its concentration in human milk, the authors have isolated and sequenced the gene for this protein. A human mammary gland cDNA library (Clontech) in gt 11 was screened by plaque hy-hybridization using a 42-mer synthetic 32 p-labelled oligo-nucleotide. Positive clones were identified and isolated, DNA was prepared and the gene isolated by cleavage with EcoR1. Following subcloning (PUC18), restriction mapping and Southern blotting, DNA for sequencing was prepared. The gene was sequenced by the dideoxy method. Human β-casein has 212 amino acids and the amino acid sequence deducted from the nucleotide sequence is to 91% identical to the published sequence for human β-casein show a high degree of conservation at the leader peptide and the highly phosphorylated sequences, but also deletions and divergence at several positions. These results provide insight into the structure of the human β-casein gene and will facilitate studies on factors affecting its expression

  10. Serum-dependent selective expression of EhTMKB1-9, a member of Entamoeba histolytica B1 family of transmembrane kinases.

    Directory of Open Access Journals (Sweden)

    Shiteshu Shrimal

    Full Text Available Entamoeba histolytica transmembrane kinases (EhTMKs can be grouped into six distinct families on the basis of motifs and sequences. Analysis of the E. histolytica genome revealed the presence of 35 EhTMKB1 members on the basis of sequence identity (>or=95%. Only six homologs were full length containing an extracellular domain, a transmembrane segment and an intracellular kinase domain. Reverse transcription followed by polymerase chain reaction (RT-PCR of the kinase domain was used to generate a library of expressed sequences. Sequencing of randomly picked clones from this library revealed that about 95% of the clones were identical with a single member, EhTMKB1-9, in proliferating cells. On serum starvation, the relative number of EhTMKB1-9 derived sequences decreased with concomitant increase in the sequences derived from another member, EhTMKB1-18. The change in their relative expression was quantified by real time PCR. Northern analysis and RNase protection assay were used to study the temporal nature of EhTMKB1-9 expression after serum replenishment of starved cells. The results showed that the expression of EhTMKB1-9 was sinusoidal. Specific transcriptional induction of EhTMKB1-9 upon serum replenishment was further confirmed by reporter gene (luciferase expression and the upstream sequence responsible for serum responsiveness was identified. EhTMKB1-9 is one of the first examples of an inducible gene in Entamoeba. The protein encoded by this member was functionally characterized. The recombinant kinase domain of EhTMKB1-9 displayed protein kinase activity. It is likely to have dual specificity as judged from its sensitivity to different kinase inhibitors. Immuno-localization showed EhTMKB1-9 to be a surface protein which decreased on serum starvation and got relocalized on serum replenishment. Cell lines expressing either EhTMKB1-9 without kinase domain, or EhTMKB1-9 antisense RNA, showed decreased cellular proliferation and target cell

  11. ORF-selector ESPRIT: a second generation library screen for soluble protein expression employing precise open reading frame selection.

    Science.gov (United States)

    An, Yingfeng; Yumerefendi, Hayretin; Mas, Philippe J; Chesneau, Alban; Hart, Darren J

    2011-08-01

    Here we present ORF-selector ESPRIT, a 9-fold enhanced version of our technology for screening incremental truncation libraries to identify soluble high yielding constructs of challenging proteins. Gene fragments are truncated at both termini to access internal domains and the resulting reading frame problem is addressed by an unbiased, intein-based open reading frame selection yielding only in-frame DNA inserts. This enriched library is then subcloned into a standard high-level expression plasmid where tens of thousands of constructs can be assayed in a two-step process using colony- and liquid-handling robots to isolate rare highly expressing clones useful for production of multi milligram quantities of purifiable proteins. The p85α protein was used to benchmark the system resulting in isolation of all known domains, either alone or in tandem. The human kinase IKK1 was then screened resulting in purification of a predicted internal domain. This strategy provides an integrated, facile route to produce soluble proteins from challenging and poorly understood target genes at quantities compatible with structural biology, screening applications and immunisation studies. The high genetic diversity that can be sampled opens the way to study more diverse systems including multisubunit complexes. Copyright © 2011 Elsevier Inc. All rights reserved.

  12. [Clone, construct, expression and verification of lactoferricin B gene and several sequence mutations in yeast].

    Science.gov (United States)

    Feng, Yong-qian; Zha, Xiao-jun; Zhai, Chao-yang

    2007-07-01

    To construct the eucaryotic recombinant plasmid of pYES2/LactoferricinB expressing in yeast of S. cerevisiae, of which the expressed protein antibacterial activity was verified in preliminary. By self-template PCR method, the gene of Lactoferricin B and its several sequence mutations were amplified with the parts of the pre-synthesized single chains. And then Lactoferricin B gene and its mutants were cloned into the vector of pYES2 to construct the recombined expression plasmid pYES2/Lactoferricin B etc. extracted and used to transform the yeast S. cerevisiae. The expressions of proteins were determined after induced by galactose. The expression proteins were collected and purified by hydronium-exchange column, and the bacterial inhibited test was applied to identify the protein antibacterial activities. The PCR amplifying and DNA sequencing tests indicated that the purpose plasmid contained the Lactoferricin B gene and several mutations. The induced target proteins were confirmed by SDS-PAGE electrophoresis and mass spectrum test. The protein antibacterial activities of mutations were verified in preliminary. The recombined plasmid pYES2/Lactoferricin B etc. are successfully constructed and induced to express in yeast cell of S. cerevisiae; the obtained recombined protein of Lactoferricin B provides a basis for further research work on the biological function and antibacterial activity.

  13. Intercellular signalling in Vibrio harveyi: sequence and function of genes regulating expression of luminescence.

    Science.gov (United States)

    Bassler, B L; Wright, M; Showalter, R E; Silverman, M R

    1993-08-01

    Density-dependent expression of luminescence in Vibrio harveyi is regulated by the concentration of an extracellular signal molecule (autoinducer) in the culture medium. A recombinant clone that restored function to one class of spontaneous dim mutants was found to encode functions necessary for the synthesis of, and response to, a signal molecule. Sequence analysis of the region encoding these functions revealed three open reading frames, two (luxL and luxM) that are required for production of an autoinducer substance and a third (luxN) that is required for response to this signal substance. The LuxL and LuxM proteins are not similar in amino acid sequence to other proteins in the database, but the LuxN protein contains regions of sequence resembling both the histidine protein kinase and the response regulator domains of the family of two-component, signal transduction proteins. The phenotypes of mutants with luxL, luxM and luxN defects indicated that an additional signal-response system controlling density-dependent expression of luminescence remains to be identified.

  14. Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project.

    Science.gov (United States)

    Cho, Jin-Young; Lee, Hyoung-Joo; Jeong, Seul-Ki; Kim, Kwang-Youl; Kwon, Kyung-Hoon; Yoo, Jong Shin; Omenn, Gilbert S; Baker, Mark S; Hancock, William S; Paik, Young-Ki

    2015-12-04

    Approximately 2.9 billion long base-pair human reference genome sequences are known to encode some 20 000 representative proteins. However, 3000 proteins, that is, ~15% of all proteins, have no or very weak proteomic evidence and are still missing. Missing proteins may be present in rare samples in very low abundance or be only temporarily expressed, causing problems in their detection and protein profiling. In particular, some technical limitations cause missing proteins to remain unassigned. For example, current mass spectrometry techniques have high limits and error rates for the detection of complex biological samples. An insufficient proteome coverage in a reference sequence database and spectral library also raises major issues. Thus, the development of a better strategy that results in greater sensitivity and accuracy in the search for missing proteins is necessary. To this end, we used a new strategy, which combines a reference spectral library search and a simulated spectral library search, to identify missing proteins. We built the human iRefSPL, which contains the original human reference spectral library and additional peptide sequence-spectrum match entries from other species. We also constructed the human simSPL, which contains the simulated spectra of 173 907 human tryptic peptides determined by MassAnalyzer (version 2.3.1). To prove the enhanced analytical performance of the combination of the human iRefSPL and simSPL methods for the identification of missing proteins, we attempted to reanalyze the placental tissue data set (PXD000754). The data from each experiment were analyzed using PeptideProphet, and the results were combined using iProphet. For the quality control, we applied the class-specific false-discovery rate filtering method. All of the results were filtered at a false-discovery rate of libraries, iRefSPL and simSPL, were designed to ensure no overlap of the proteome coverage. They were shown to be complementary to spectral library

  15. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology.

    Science.gov (United States)

    Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi

    2012-07-02

    Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  16. Transcriptome analysis of carnation (Dianthus caryophyllus L. based on next-generation sequencing technology

    Directory of Open Access Journals (Sweden)

    Tanase Koji

    2012-07-01

    Full Text Available Abstract Background Carnation (Dianthus caryophyllus L., in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380 of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.