WorldWideScience

Sample records for eugenii transcript sequencing

  1. Lactation transcriptomics in the Australian marsupial, Macropus eugenii: transcript sequencing and quantification

    Directory of Open Access Journals (Sweden)

    Whitley Jane C

    2007-11-01

    Full Text Available Abstract Background Lactation is an important aspect of mammalian biology and, amongst mammals, marsupials show one of the most complex lactation cycles. Marsupials, such as the tammar wallaby (Macropus eugenii give birth to a relatively immature newborn and progressive changes in milk composition and milk production regulate early stage development of the young. Results In order to investigate gene expression in the marsupial mammary gland during lactation, a comprehensive set of cDNA libraries was derived from lactating tissues throughout the lactation cycle of the tammar wallaby. A total of 14,837 express sequence tags were produced by cDNA sequencing. Sequence analysis and sequence assembly were used to construct a comprehensive catalogue of mammary transcripts. Sequence data from pregnant and early or late lactating specific cDNA libraries and, data from early or late lactation massively parallel sequencing strategies were combined to analyse the variation of milk protein gene expression during the lactation cycle. Conclusion Results show a steady increase in expression of genes coding for secreted protein during the lactation cycle that is associated with high proportion of transcripts coding for milk proteins. In addition, genes involved in immune function, translation and energy or anabolic metabolism are expressed across the lactation cycle. A number of potential new milk proteins or mammary gland remodelling markers, including noncoding RNAs have been identified.

  2. A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii)

    OpenAIRE

    Patel Hardip R; Wakefield Matthew J; Wei Ke-jun; Webley Lee; Wang Chenwei; Deakin Janine E; Alsop Amber; Marshall Graves Jennifer A; Cooper Desmond W; Nicholas Frank W; Zenger Kyall R

    2011-01-01

    Abstract Background The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing...

  3. Cell type-specific termination of transcription by transposable element sequences.

    Science.gov (United States)

    Conley, Andrew B; Jordan, I King

    2012-09-30

    Transposable elements (TEs) encode sequences necessary for their own transposition, including signals required for the termination of transcription. TE sequences within the introns of human genes show an antisense orientation bias, which has been proposed to reflect selection against TE sequences in the sense orientation owing to their ability to terminate the transcription of host gene transcripts. While there is evidence in support of this model for some elements, the extent to which TE sequences actually terminate transcription of human gene across the genome remains an open question. Using high-throughput sequencing data, we have characterized over 9,000 distinct TE-derived sequences that provide transcription termination sites for 5,747 human genes across eight different cell types. Rarefaction curve analysis suggests that there may be twice as many TE-derived termination sites (TE-TTS) genome-wide among all human cell types. The local chromatin environment for these TE-TTS is similar to that seen for 3' UTR canonical TTS and distinct from the chromatin environment of other intragenic TE sequences. However, those TE-TTS located within the introns of human genes were found to be far more cell type-specific than the canonical TTS. TE-TTS were much more likely to be found in the sense orientation than other intragenic TE sequences of the same TE family and TE-TTS in the sense orientation terminate transcription more efficiently than those found in the antisense orientation. Alu sequences were found to provide a large number of relatively weak TTS, whereas LTR elements provided a smaller number of much stronger TTS. TE sequences provide numerous termination sites to human genes, and TE-derived TTS are particularly cell type-specific. Thus, TE sequences provide a powerful mechanism for the diversification of transcriptional profiles between cell types and among evolutionary lineages, since most TE-TTS are evolutionarily young. The extent of transcription

  4. Cell type-specific termination of transcription by transposable element sequences

    Directory of Open Access Journals (Sweden)

    Conley Andrew B

    2012-09-01

    Full Text Available Abstract Background Transposable elements (TEs encode sequences necessary for their own transposition, including signals required for the termination of transcription. TE sequences within the introns of human genes show an antisense orientation bias, which has been proposed to reflect selection against TE sequences in the sense orientation owing to their ability to terminate the transcription of host gene transcripts. While there is evidence in support of this model for some elements, the extent to which TE sequences actually terminate transcription of human gene across the genome remains an open question. Results Using high-throughput sequencing data, we have characterized over 9,000 distinct TE-derived sequences that provide transcription termination sites for 5,747 human genes across eight different cell types. Rarefaction curve analysis suggests that there may be twice as many TE-derived termination sites (TE-TTS genome-wide among all human cell types. The local chromatin environment for these TE-TTS is similar to that seen for 3′ UTR canonical TTS and distinct from the chromatin environment of other intragenic TE sequences. However, those TE-TTS located within the introns of human genes were found to be far more cell type-specific than the canonical TTS. TE-TTS were much more likely to be found in the sense orientation than other intragenic TE sequences of the same TE family and TE-TTS in the sense orientation terminate transcription more efficiently than those found in the antisense orientation. Alu sequences were found to provide a large number of relatively weak TTS, whereas LTR elements provided a smaller number of much stronger TTS. Conclusions TE sequences provide numerous termination sites to human genes, and TE-derived TTS are particularly cell type-specific. Thus, TE sequences provide a powerful mechanism for the diversification of transcriptional profiles between cell types and among evolutionary lineages, since most TE-TTS are

  5. A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii

    Directory of Open Access Journals (Sweden)

    Patel Hardip R

    2011-08-01

    Full Text Available Abstract Background The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing genome sequence assembly and gene ordering in this species. Results A second-generation anchored tammar wallaby genetic linkage map has been constructed based on a total of 148 loci. The linkage map contains the original 64 loci included in the first-generation map, plus an additional 84 microsatellite loci that were chosen specifically to increase coverage and assist with the anchoring and orientation of linkage groups to chromosomes. These additional loci were derived from (a sequenced BAC clones that had been previously mapped to tammar wallaby chromosomes by fluorescence in situ hybridization (FISH, (b End sequence from BACs subsequently FISH-mapped to tammar wallaby chromosomes, and (c tammar wallaby genes orthologous to opossum genes predicted to fill gaps in the tammar wallaby linkage map as well as three X-linked markers from a published study. Based on these 148 loci, eight linkage groups were formed. These linkage groups were assigned (via FISH-mapped markers to all seven autosomes and the X chromosome. The sex-pooled map size is 1402.4 cM, which is estimated to provide 82.6% total coverage of the genome, with an average interval distance of 10.9 cM between adjacent markers. The overall ratio of female/male map length is 0.84, which is comparable to the ratio of 0.78 obtained for the first-generation map. Conclusions Construction of this second-generation genetic linkage map is a significant step towards complete coverage of the tammar wallaby

  6. A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii).

    Science.gov (United States)

    Wang, Chenwei; Webley, Lee; Wei, Ke-jun; Wakefield, Matthew J; Patel, Hardip R; Deakin, Janine E; Alsop, Amber; Marshall Graves, Jennifer A; Cooper, Desmond W; Nicholas, Frank W; Zenger, Kyall R

    2011-08-19

    The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing genome sequence assembly and gene ordering in this species. A second-generation anchored tammar wallaby genetic linkage map has been constructed based on a total of 148 loci. The linkage map contains the original 64 loci included in the first-generation map, plus an additional 84 microsatellite loci that were chosen specifically to increase coverage and assist with the anchoring and orientation of linkage groups to chromosomes. These additional loci were derived from (a) sequenced BAC clones that had been previously mapped to tammar wallaby chromosomes by fluorescence in situ hybridization (FISH), (b) End sequence from BACs subsequently FISH-mapped to tammar wallaby chromosomes, and (c) tammar wallaby genes orthologous to opossum genes predicted to fill gaps in the tammar wallaby linkage map as well as three X-linked markers from a published study. Based on these 148 loci, eight linkage groups were formed. These linkage groups were assigned (via FISH-mapped markers) to all seven autosomes and the X chromosome. The sex-pooled map size is 1402.4 cM, which is estimated to provide 82.6% total coverage of the genome, with an average interval distance of 10.9 cM between adjacent markers. The overall ratio of female/male map length is 0.84, which is comparable to the ratio of 0.78 obtained for the first-generation map. Construction of this second-generation genetic linkage map is a significant step towards complete coverage of the tammar wallaby genome and considerably extends that of the first

  7. Transcription blockage by homopurine DNA sequences: role of sequence composition and single-strand breaks

    Science.gov (United States)

    Belotserkovskii, Boris P.; Neil, Alexander J.; Saleh, Syed Shayon; Shin, Jane Hae Soo; Mirkin, Sergei M.; Hanawalt, Philip C.

    2013-01-01

    The ability of DNA to adopt non-canonical structures can affect transcription and has broad implications for genome functioning. We have recently reported that guanine-rich (G-rich) homopurine-homopyrimidine sequences cause significant blockage of transcription in vitro in a strictly orientation-dependent manner: when the G-rich strand serves as the non-template strand [Belotserkovskii et al. (2010) Mechanisms and implications of transcription blockage by guanine-rich DNA sequences., Proc. Natl Acad. Sci. USA, 107, 12816–12821]. We have now systematically studied the effect of the sequence composition and single-stranded breaks on this blockage. Although substitution of guanine by any other base reduced the blockage, cytosine and thymine reduced the blockage more significantly than adenine substitutions, affirming the importance of both G-richness and the homopurine-homopyrimidine character of the sequence for this effect. A single-strand break in the non-template strand adjacent to the G-rich stretch dramatically increased the blockage. Breaks in the non-template strand result in much weaker blockage signals extending downstream from the break even in the absence of the G-rich stretch. Our combined data support the notion that transcription blockage at homopurine-homopyrimidine sequences is caused by R-loop formation. PMID:23275544

  8. Thermodynamics-based models of transcriptional regulation with gene sequence.

    Science.gov (United States)

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  9. Using TESS to predict transcription factor binding sites in DNA sequence.

    Science.gov (United States)

    Schug, Jonathan

    2008-03-01

    This unit describes how to use the Transcription Element Search System (TESS). This Web site predicts transcription factor binding sites (TFBS) in DNA sequence using two different kinds of models of sites, strings and positional weight matrices. The binding of transcription factors to DNA is a major part of the control of gene expression. Transcription factors exhibit sequence-specific binding; they form stronger bonds to some DNA sequences than to others. Identification of a good binding site in the promoter for a gene suggests the possibility that the corresponding factor may play a role in the regulation of that gene. However, the sequences transcription factors recognize are typically short and allow for some amount of mismatch. Because of this, binding sites for a factor can typically be found at random every few hundred to a thousand base pairs. TESS has features to help sort through and evaluate the significance of predicted sites.

  10. Prevalence of transcription promoters within archaeal operons and coding sequences.

    Science.gov (United States)

    Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

    2009-01-01

    Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.

  11. Transcriptional Slippage and RNA Editing Increase the Diversity of Transcripts in Chloroplasts: Insight from Deep Sequencing of Vigna radiata Genome and Transcriptome.

    Directory of Open Access Journals (Sweden)

    Ching-Ping Lin

    Full Text Available We performed deep sequencing of the nuclear and organellar genomes of three mungbean genotypes: Vigna radiata ssp. sublobata TC1966, V. radiata var. radiata NM92 and the recombinant inbred line RIL59 derived from a cross between TC1966 and NM92. Moreover, we performed deep sequencing of the RIL59 transcriptome to investigate transcript variability. The mungbean chloroplast genome has a quadripartite structure including a pair of inverted repeats separated by two single copy regions. A total of 213 simple sequence repeats were identified in the chloroplast genomes of NM92 and RIL59; 78 single nucleotide variants and nine indels were discovered in comparing the chloroplast genomes of TC1966 and NM92. Analysis of the mungbean chloroplast transcriptome revealed mRNAs that were affected by transcriptional slippage and RNA editing. Transcriptional slippage frequency was positively correlated with the length of simple sequence repeats of the mungbean chloroplast genome (R2=0.9911. In total, 41 C-to-U editing sites were found in 23 chloroplast genes and in one intergenic spacer. No editing site that swapped U to C was found. A combination of bioinformatics and experimental methods revealed that the plastid-encoded RNA polymerase-transcribed genes psbF and ndhA are affected by transcriptional slippage in mungbean and in main lineages of land plants, including three dicots (Glycine max, Brassica rapa, and Nicotiana tabacum, two monocots (Oryza sativa and Zea mays, two gymnosperms (Pinus taeda and Ginkgo biloba and one moss (Physcomitrella patens. Transcript analysis of the rps2 gene showed that transcriptional slippage could affect transcripts at single sequence repeat regions with poly-A runs. It showed that transcriptional slippage together with incomplete RNA editing may cause sequence diversity of transcripts in chloroplasts of land plants.

  12. Transcriptional blockages in a cell-free system by sequence-selective DNA alkylating agents.

    Science.gov (United States)

    Ferguson, L R; Liu, A P; Denny, W A; Cullinane, C; Talarico, T; Phillips, D R

    2000-04-14

    There is considerable interest in DNA sequence-selective DNA-binding drugs as potential inhibitors of gene expression. Five compounds with distinctly different base pair specificities were compared in their effects on the formation and elongation of the transcription complex from the lac UV5 promoter in a cell-free system. All were tested at drug levels which killed 90% of cells in a clonogenic survival assay. Cisplatin, a selective alkylator at purine residues, inhibited transcription, decreasing the full-length transcript, and causing blockage at a number of GG or AG sequences, making it probable that intrastrand crosslinks are the blocking lesions. A cyclopropylindoline known to be an A-specific alkylator also inhibited transcription, with blocks at adenines. The aniline mustard chlorambucil, that targets primarily G but also A sequences, was also effective in blocking the formation of full-length transcripts. It produced transcription blocks either at, or one base prior to, AA or GG sequences, suggesting that intrastrand crosslinks could again be involved. The non-alkylating DNA minor groove binder Hoechst 33342 (a bisbenzimidazole) blocked formation of the full-length transcript, but without creating specific blockage sites. A bisbenzimidazole-linked aniline mustard analogue was a more effective transcription inhibitor than either chlorambucil or Hoechst 33342, with different blockage sites occurring immediately as compared with 2 h after incubation. The blockages were either immediately prior to AA or GG residues, or four to five base pairs prior to such sites, a pattern not predicted from in vitro DNA-binding studies. Minor groove DNA-binding ligands are of particular interest as inhibitors of gene expression, since they have the potential ability to bind selectively to long sequences of DNA. The results suggest that the bisbenzimidazole-linked mustard does cause alkylation and transcription blockage at novel DNA sites. in addition to sites characteristic of

  13. Microarray and cDNA sequence analysis of transcription during nerve-dependent limb regeneration

    Directory of Open Access Journals (Sweden)

    Bryant Susan V

    2009-01-01

    Full Text Available Abstract Background Microarray analysis and 454 cDNA sequencing were used to investigate a centuries-old problem in regenerative biology: the basis of nerve-dependent limb regeneration in salamanders. Innervated (NR and denervated (DL forelimbs of Mexican axolotls were amputated and transcripts were sampled after 0, 5, and 14 days of regeneration. Results Considerable similarity was observed between NR and DL transcriptional programs at 5 and 14 days post amputation (dpa. Genes with extracellular functions that are critical to wound healing were upregulated while muscle-specific genes were downregulated. Thus, many processes that are regulated during early limb regeneration do not depend upon nerve-derived factors. The majority of the transcriptional differences between NR and DL limbs were correlated with blastema formation; cell numbers increased in NR limbs after 5 dpa and this yielded distinct transcriptional signatures of cell proliferation in NR limbs at 14 dpa. These transcriptional signatures were not observed in DL limbs. Instead, gene expression changes within DL limbs suggest more diverse and protracted wound-healing responses. 454 cDNA sequencing complemented the microarray analysis by providing deeper sampling of transcriptional programs and associated biological processes. Assembly of new 454 cDNA sequences with existing expressed sequence tag (EST contigs from the Ambystoma EST database more than doubled (3935 to 9411 the number of non-redundant human-A. mexicanum orthologous sequences. Conclusion Many new candidate gene sequences were discovered for the first time and these will greatly enable future studies of wound healing, epigenetics, genome stability, and nerve-dependent blastema formation and outgrowth using the axolotl model.

  14. Nucleotide sequences of cDNAs for human papillomavirus type 18 transcripts in HeLa cells

    International Nuclear Information System (INIS)

    Inagaki, Yutaka; Tsunokawa, Youko; Takebe, Naoko; Terada, Masaaki; Sugimura, Takashi; Nawa, Hiroyuki; Nakanishi, Shigetada

    1988-01-01

    HeLa cells expressed 3.4- and 1.6-kilobase (kb) transcripts of the integrated human papillomavirus (HPV) type 18 genome. Two types of cDNA clones representing each size of HPV type 18 transcript were isolated. Sequence analysis of these two types of cDNA clones revealed that the 3.4-kb transcript contained E6, E7, the 5' portion of E1, and human sequence and that the 1.6-kb transcript contained spliced and frameshifted E6 (E6 * ), E7, and human sequence. There was a common human sequence containing a poly(A) addition signal in the 3' end portions of both transcripts, indicating that they were transcribed from the HPV genome at the same integration site with different splicing. Furthermore, the 1.6-kb transcript contained both of the two viral TATA boxes upstream of E6, strongly indicating that a cellular promoter was used for its transcription

  15. Toxoplasmosis in Tammar wallabies (Macropus eugenii) in the Budapest Zoo and Botanical Garden (2006-2010).

    Science.gov (United States)

    Sós, Endre; Szigeti, Alexandra; Fok, Eva; Molnár, Viktor; Erdélyi, Károly; Perge, Edina; Biksi, Imre; Gál, János

    2012-09-01

    Smaller macropodid species (commonly referred to as wallabies) are extremely susceptible to toxoplasmosis: in most cases, infection with Toxoplasma gondii leads to death within a short time. Between June 2006 and July 2010, T. gondii was detected by immunohistochemical examination in six Tammar wallabies (Macropus eugenii) that died in the Budapest Zoo and Botanical Garden; in another four specimens histopathology revealed T. gondii-like organisms (which could not be differentiated from Neospora caninum solely by morphology), and in another 11 animals toxoplasmosis as the possible cause of death could not be excluded. The current zoo population of 12 Tammar wallabies was tested for T. gondii IgG antibodies by the modified agglutination test (MAT), with negative results. We suppose that most of the deaths were due to acute toxoplasmosis resulting from a recent infection.

  16. The chemical structure of DNA sequence signals for RNA transcription

    Science.gov (United States)

    George, D. G.; Dayhoff, M. O.

    1982-01-01

    The proposed recognition sites for RNA transcription for E. coli NRA polymerase, bacteriophage T7 RNA polymerase, and eukaryotic RNA polymerase Pol II are evaluated in the light of the requirements for efficient recognition. It is shown that although there is good experimental evidence that specific nucleic acid sequence patterns are involved in transcriptional regulation in bacteria and bacterial viruses, among the sequences now available, only in the case of the promoters recognized by bacteriophage T7 polymerase does it seem likely that the pattern is sufficient. It is concluded that the eukaryotic pattern that is investigated is not restrictive enough to serve as a recognition site.

  17. In silico detection of sequence variations modifying transcriptional regulation.

    Directory of Open Access Journals (Sweden)

    Malin C Andersen

    2008-01-01

    Full Text Available Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers. The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

  18. In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

    Science.gov (United States)

    Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

    2008-01-01

    Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319

  19. A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity.

    Science.gov (United States)

    Hobbs, Matthew; Pavasovic, Ana; King, Andrew G; Prentis, Peter J; Eldridge, Mark D B; Chen, Zhiliang; Colgan, Donald J; Polkinghorne, Adam; Wilkins, Marc R; Flanagan, Cheyne; Gillett, Amber; Hanger, Jon; Johnson, Rebecca N; Timms, Peter

    2014-09-11

    The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene.Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. This transcriptomic

  20. Gene discovery and transcript analyses in the corn smut pathogen Ustilago maydis: expressed sequence tag and genome sequence comparison

    Directory of Open Access Journals (Sweden)

    Saville Barry J

    2007-09-01

    Full Text Available Abstract Background Ustilago maydis is the basidiomycete fungus responsible for common smut of corn and is a model organism for the study of fungal phytopathogenesis. To aid in the annotation of the genome sequence of this organism, several expressed sequence tag (EST libraries were generated from a variety of U. maydis cell types. In addition to utility in the context of gene identification and structure annotation, the ESTs were analyzed to identify differentially abundant transcripts and to detect evidence of alternative splicing and anti-sense transcription. Results Four cDNA libraries were constructed using RNA isolated from U. maydis diploid teliospores (U. maydis strains 518 × 521 and haploid cells of strain 521 grown under nutrient rich, carbon starved, and nitrogen starved conditions. Using the genome sequence as a scaffold, the 15,901 ESTs were assembled into 6,101 contiguous expressed sequences (contigs; among these, 5,482 corresponded to predicted genes in the MUMDB (MIPS Ustilago maydis database, while 619 aligned to regions of the genome not yet designated as genes in MUMDB. A comparison of EST abundance identified numerous genes that may be regulated in a cell type or starvation-specific manner. The transcriptional response to nitrogen starvation was assessed using RT-qPCR. The results of this suggest that there may be cross-talk between the nitrogen and carbon signalling pathways in U. maydis. Bioinformatic analysis identified numerous examples of alternative splicing and anti-sense transcription. While intron retention was the predominant form of alternative splicing in U. maydis, other varieties were also evident (e.g. exon skipping. Selected instances of both alternative splicing and anti-sense transcription were independently confirmed using RT-PCR. Conclusion Through this work: 1 substantial sequence information has been provided for U. maydis genome annotation; 2 new genes were identified through the discovery of 619

  1. Sequence organization and control of transcription in the bacteriophage T4 tRNA region.

    Science.gov (United States)

    Broida, J; Abelson, J

    1985-10-05

    Bacteriophage T4 contains genes for eight transfer RNAs and two stable RNAs of unknown function. These are found in two clusters at 70 X 10(3) base-pairs on the T4 genetic map. To understand the control of transcription in this region we have completed the sequencing of 5000 base-pairs in this region. The sequence contains a part of gene 3, gene 1, gene 57, internal protein I, the tRNA genes and five open reading frames which most likely code for heretofore unidentified proteins. We have used subclones of the region to investigate the kinetics of transcription in vivo. The results show that transcription in this region consists of overlapping early, middle and late transcripts. Transcription is directed from two early promoters, one or two middle promoters and perhaps two late promoters. This region contains all of the features that are seen in T4 transcription and as such is a good place to study the phenomenon in more detail.

  2. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.

    Science.gov (United States)

    Anvar, Seyed Yahya; Allard, Guy; Tseng, Elizabeth; Sheynkman, Gloria M; de Klerk, Eleonora; Vermaat, Martijn; Yin, Raymund H; Johansson, Hans E; Ariyurek, Yavuz; den Dunnen, Johan T; Turner, Stephen W; 't Hoen, Peter A C

    2018-03-29

    The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing. In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells. Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.

  3. A comprehensive set of transcript sequences of the heavy metal hyperaccumulator Noccaea caerulescens

    Directory of Open Access Journals (Sweden)

    YA-FEN eLIN

    2014-06-01

    Full Text Available Noccaea caerulescens is an extremophile plant species belonging to the Brassicaceae family. It has adapted to grow on soils containing high, normally toxic, concentrations of metals such as nickel, zinc and cadmium. Next to being extremely tolerant to these metals, it is one of the few species known to hyperaccumulate these metals to extremely high concentrations in their aboveground biomass. In order to provide additional molecular resources for this model metal hyperaccumulator species to study and understand the mechanism of heavy metal exposure adaptation, we aimed to provide a comprehensive database of transcript sequences for N. caerulescens. In this study, 23830 transcript sequences (isotigs with an average length of 1025 bps were determined for roots, shoots and inflorescences of N. caerulescens accession ‘Ganges’ by Roche GS-FLEX 454 pyrosequencing. These isotigs were grouped into 20,378 isogroups, representing potential genes. This is a large expansion of the existing N. caerulescens transcriptome set consisting of 3705 unigenes. When compared to a Brassicaceae proteome set, 22,232 (93.2% of the N. caerulescens isotigs (corresponding to 19191 isogroups had a significant match and could be annotated accordingly. Of the remaining sequences, 98 isotigs resembled non-plant sequences and 1386 had no significant similarity to any sequence in the GenBank database. Among the annotated set there were many isotigs with similarity to metal homeostasis genes or genes for glucosinolate biosynthesis. Only for transcripts similar to Metallothionein3 (MT3, clear evidence for an additional copy was found. This comprehensive set of transcripts is expected to further contribute to the discovery of mechanisms used by N. caerulescens to adapt to heavy metal exposure.

  4. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction.

    Directory of Open Access Journals (Sweden)

    Aalt D J van Dijk

    Full Text Available Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and

  5. A Next-Generation Sequencing Approach Uncovers Viral Transcripts Incorporated in Poxvirus Virions

    Directory of Open Access Journals (Sweden)

    Marica Grossegesse

    2017-10-01

    Full Text Available Transcripts are known to be incorporated in particles of DNA viruses belonging to the families of Herpesviridae and Mimiviridae, but the presence of transcripts in other DNA viruses, such as poxviruses, has not been analyzed yet. Therefore, we first established a next-generation-sequencing (NGS-based protocol, enabling the unbiased identification of transcripts in virus particles. Subsequently, we applied our protocol to analyze RNA in an emerging zoonotic member of the Poxviridae family, namely Cowpox virus. Our results revealed the incorporation of 19 viral transcripts, while host identifications were restricted to ribosomal and mitochondrial RNA. Most viral transcripts had an unknown and immunomodulatory function, suggesting that transcript incorporation may be beneficial for poxvirus immune evasion. Notably, the most abundant transcript originated from the D5L/I1R gene that encodes a viral inhibitor of the host cytoplasmic DNA sensing machinery.

  6. Identification of cis-regulatory sequences that activate transcription in the suspensor of plant embryos.

    Science.gov (United States)

    Kawashima, Tomokazu; Wang, Xingjun; Henry, Kelli F; Bi, Yuping; Weterings, Koen; Goldberg, Robert B

    2009-03-03

    Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the scarlet runner bean (Phaseolus coccineus) G564 gene to understand how genes are activated specifically within the suspensor during early embryo development. Previously, we showed that the G564 upstream region has a block of tandem repeats, which contain a conserved 10-bp motif (GAAAAG(C)/(T)GAA), and that deletion of these repeats results in a loss of suspensor transcription. Here, we use gain-of-function (GOF) experiments with transgenic globular-stage tobacco embryos to show that only 1 of the 5 tandem repeats is required to drive suspensor-specific transcription. Fine-scale deletion and scanning mutagenesis experiments with 1 tandem repeat uncovered a 54-bp region that contains all of the sequences required to activate transcription in the suspensor, including the 10-bp motif (GAAAAGCGAA) and a similar 10-bp-like motif (GAAAAACGAA). Site-directed mutagenesis and GOF experiments indicated that both the 10-bp and 10-bp-like motifs are necessary, but not sufficient to activate transcription in the suspensor, and that a sequence (TTGGT) between the 10-bp and the 10-bp-like motifs is also necessary for suspensor transcription. Together, these data identify sequences that are required to activate transcription in the suspensor of a plant embryo after fertilization.

  7. Extended region of nodulation genes in Rhizobium meliloti 1021. II. Nucleotide sequence, transcription start sites and protein products

    International Nuclear Information System (INIS)

    Fisher, R.F.; Swanson, J.A.; Mulligan, J.T.; Long, S.R.

    1987-01-01

    The authors have established the DNA sequence and analyzed the transcription and translation products of a series of putative nodulation (nod) genes in Rhizobium meliloti strain 1021. Four loci have been designated nodF, nodE, nodG and nodH. The correlation of transposon insertion positions with phenotypes and open reading frames was confirmed by sequencing the insertion junctions of the transposons. The protein products of these nod genes were visualized by in vitro expression of cloned DNA segments in a R. meliloti transcription-translation system. In addition, the sequence for nodG was substantiated by creating translational fusions in all three reading frames at several points in the sequence; the resulting fusions were expressed in vitro in both E. coli and R. meliloti transcription-translation systems. A DNA segment bearing several open reading frames downstream of nodG corresponds to the putative nod gene mutated in strain nod-216. The transcription start sites of nodF and nodH were mapped by primer extension of RNA from cells induced with the plant flavone, luteolin. Initiation of transcription occurs approximately 25 bp downstream from the conserved sequence designated the nod box, suggesting that this conserved sequence acts as an upstream regulator of inducible nod gene expression. Its distance from the transcription start site is more suggestive of an activator binding site rather than an RNA polymerase binding site

  8. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Heilbronn, T.; Jahn, G.; Buerkle, A.; Freese, U.K.; Fleckenstein, B.; Zur Hausen, H.

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSF-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at T/sub m/ - 25/degrees/C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Esptein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein

  9. Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

    Science.gov (United States)

    Reid-Bayliss, Kate S; Loeb, Lawrence A

    2017-08-29

    Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.

  10. Sequence2Vec: A novel embedding approach for modeling transcription factor binding affinity landscape

    KAUST Repository

    Dai, Hanjun

    2017-07-26

    Motivation: An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Results: Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model (HMM) which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these HMMs into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA data sets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods.

  11. Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts

    Directory of Open Access Journals (Sweden)

    Ouyang Shu

    2005-09-01

    Full Text Available Abstract Background The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale. Results All available ESTs and Expressed Transcripts (ETs, 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana, were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55–81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28–58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16–19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices. Conclusion Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.

  12. Deduction of upstream sequences of Xanthomonas campestris flagellar genes responding to transcription activation by FleQ

    International Nuclear Information System (INIS)

    Hu, R.-M.; Yang, T.-C.; Yang, S.-H.; Tseng, Y.-H.

    2005-01-01

    Xanthomonas campestris pv. campestris (Xcc), a close relative to Pseudomonas aeruginosa, is the pathogen causing black rot in cruciferous plants. In P. aeruginosa, FleQ serves as a cognate activator of σ 54 in transcription from several σ 54 -dependent promoters of flagellar genes. These P. aeruginosa promoters have been analyzed for FleQ-binding sequences; however, no consensus was deduced. Xcc, although lacks fleSR, has a fleQ homologue residing among over 40 contiguously clustered flagellar genes. A fleQ mutant, Xc17fleQ, constructed by insertional mutation is deficient in FleQ protein, non-flagellated, and immobile. Transcriptional fusion assays on six putative σ 54 -dependent promoters of the flagellar genes, fliE, fliQ, fliL, flgG, flgB, and flhF, indicated that each of them is also FleQ dependent. Each of these promoters has a sequence with weak consensus to 5'-gaaacCCgccgCcgctTt-3', immediately upstream of the predicted σ 54 -binding site, with an imperfect inverted repeat containing a GC-rich center flanked by several A and T at 5'- and 3'-ends, respectively. Replacing this region in fliE promoter with a HindIII recognition sequence abolished the transcription, indicating that this region responds to transcription activation by FleQ

  13. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    Directory of Open Access Journals (Sweden)

    Johannes Eichner

    Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  14. Physiological and Pathological Transcriptional Activation of Endogenous Retroelements Assessed by RNA-Sequencing of B Lymphocytes

    Directory of Open Access Journals (Sweden)

    Jan Attig

    2017-12-01

    Full Text Available In addition to evolutionarily-accrued sequence mutation or deletion, endogenous retroelements (EREs in eukaryotic genomes are subject to epigenetic silencing, preventing or reducing their transcription, particularly in the germplasm. Nevertheless, transcriptional activation of EREs, including endogenous retroviruses (ERVs and long interspersed nuclear elements (LINEs, is observed in somatic cells, variably upon cellular differentiation and frequently upon cellular transformation. ERE transcription is modulated during physiological and pathological immune cell activation, as well as in immune cell cancers. However, our understanding of the potential consequences of such modulation remains incomplete, partly due to the relative scarcity of information regarding genome-wide ERE transcriptional patterns in immune cells. Here, we describe a methodology that allows probing RNA-sequencing (RNA-seq data for genome-wide expression of EREs in murine and human cells. Our analysis of B cells reveals that their transcriptional response during immune activation is dominated by induction of gene transcription, and that EREs respond to a much lesser extent. The transcriptional activity of the majority of EREs is either unaffected or reduced by B cell activation both in mice and humans, albeit LINEs appear considerably more responsive in the latter host. Nevertheless, a small number of highly distinct ERVs are strongly and consistently induced during B cell activation. Importantly, this pattern contrasts starkly with B cell transformation, which exhibits widespread induction of EREs, including ERVs that minimally overlap with those responsive to immune stimulation. The distinctive patterns of ERE induction suggest different underlying mechanisms and will help separate physiological from pathological expression.

  15. Cloning, nucleotide sequence and transcriptional analysis of the uvrA gene from Neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Black, C.G.; Fyfe, J.A.M.; Davies, J.K.

    1997-01-01

    A recombinant plasmid capable of restoring UV resistance to an Escherichia coli uvrA mutant was isolated from a genomic library of Neisseria gonorrhoeae. Sequence analysis revealed an open reading frame whose deduced amino acid sequence displayed significant similarity to those of the UvrA proteins of other bacterial species. A second open reading frame (ORF259) was identified upstream from, and in the opposite orientation to the gonococcal uvrA gene. Transcriptional fusions between portions of the gonococcal uvrA upstream region and a reporter gene were used to localise promoter activity in both E. coli and N. gonorrhoeae. The transcriptional starting points of uvrA and ORF259 were mapped in E. coli by primer extension analysis, and corresponding σ 70 promoters were identified. The arrangement of the uvrA-ORF259 intergenic region is similar to that of the gonococcal recA-aroD intergenic region. Both contain inverted copies of the 10 bp neisserial DNA uptake sequence situated between divergently transcribed genes. However, there is no evidence that either the uptake sequence or the proximity of the promoters influences expression of these genes. (author)

  16. Global transcriptional profiling of the toxic dinoflagellate Alexandrium fundyense using Massively Parallel Signature Sequencing

    Directory of Open Access Journals (Sweden)

    Anderson Donald M

    2006-04-01

    Full Text Available Abstract Background Dinoflagellates are one of the most important classes of marine and freshwater algae, notable both for their functional diversity and ecological significance. They occur naturally as free-living cells, as endosymbionts of marine invertebrates and are well known for their involvement in "red tides". Dinoflagellates are also notable for their unusual genome content and structure, which suggests that the organization and regulation of dinoflagellate genes may be very different from that of most eukaryotes. To investigate the content and regulation of the dinoflagellate genome, we performed a global analysis of the transcriptome of the toxic dinoflagellate Alexandrium fundyense under nitrate- and phosphate-limited conditions using Massively Parallel Signature Sequencing (MPSS. Results Data from the two MPSS libraries showed that the number of unique signatures found in A. fundyense cells is similar to that of humans and Arabidopsis thaliana, two eukaryotes that have been extensively analyzed using this method. The general distribution, abundance and expression patterns of the A. fundyense signatures were also quite similar to other eukaryotes, and at least 10% of the A. fundyense signatures were differentially expressed between the two conditions. RACE amplification and sequencing of a subset of signatures showed that multiple signatures arose from sequence variants of a single gene. Single signatures also mapped to different sequence variants of the same gene. Conclusion The MPSS data presented here provide a quantitative view of the transcriptome and its regulation in these unusual single-celled eukaryotes. The observed signature abundance and distribution in Alexandrium is similar to that of other eukaryotes that have been analyzed using MPSS. Results of signature mapping via RACE indicate that many signatures result from sequence variants of individual genes. These data add to the growing body of evidence for widespread gene

  17. A ChIP-Seq benchmark shows that sequence conservation mainly improves detection of strong transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Tony Håndstad

    Full Text Available BACKGROUND: Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial. RESULTS: Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods. CONCLUSIONS: Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites.

  18. Strong transcription blockage mediated by R-loop formation within a G-rich homopurine-homopyrimidine sequence localized in the vicinity of the promoter.

    Science.gov (United States)

    Belotserkovskii, Boris P; Soo Shin, Jane Hae; Hanawalt, Philip C

    2017-06-20

    Guanine-rich (G-rich) homopurine-homopyrimidine nucleotide sequences can block transcription with an efficiency that depends upon their orientation, composition and length, as well as the presence of negative supercoiling or breaks in the non-template DNA strand. We report that a G-rich sequence in the non-template strand reduces the yield of T7 RNA polymerase transcription by more than an order of magnitude when positioned close (9 bp) to the promoter, in comparison to that for a distal (∼250 bp) location of the same sequence. This transcription blockage is much less pronounced for a C-rich sequence, and is not significant for an A-rich sequence. Remarkably, the blockage is not pronounced if transcription is performed in the presence of RNase H, which specifically digests the RNA strands within RNA-DNA hybrids. The blockage also becomes less pronounced upon reduced RNA polymerase concentration. Based upon these observations and those from control experiments, we conclude that the blockage is primarily due to the formation of stable RNA-DNA hybrids (R-loops), which inhibit successive rounds of transcription. Our results could be relevant to transcription dynamics in vivo (e.g. transcription 'bursting') and may also have practical implications for the design of expression vectors. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Real sequence effects on the search dynamics of transcription factors on DNA

    DEFF Research Database (Denmark)

    Bauer, Maximilian; Rasmussen, Emil S.; Lomholt, Michael A.

    2015-01-01

    Recent experiments show that transcription factors (TFs) indeed use the facilitated diffusion mechanism to locate their target sequences on DNA in living bacteria cells: TFs alternate between sliding motion along DNA and relocation events through the cytoplasm. From simulations and theoretical...... analysis we study the TF-sliding motion for a large section of the DNA-sequence of a common E. coli strain, based on the two-state TF-model with a fast-sliding search state and a recognition state enabling target detection. For the probability to detect the target before dissociating from DNA the TF...... on the underlying nucleotide sequence is varied. A moderate dependence maximises the capability to distinguish between the main operator and similar sequences. Moreover, these auxiliary operators serve as starting points for DNA looping with the main operator, yielding a spectrum of target detection times spanning...

  20. Pleiotropy constrains the evolution of protein but not regulatory sequences in a transcription regulatory network influencing complex social behaviours

    Directory of Open Access Journals (Sweden)

    Daria eMolodtsova

    2014-12-01

    Full Text Available It is increasingly apparent that genes and networks that influence complex behaviour are evolutionary conserved, which is paradoxical considering that behaviour is labile over evolutionary timescales. How does adaptive change in behaviour arise if behaviour is controlled by conserved, pleiotropic, and likely evolutionary constrained genes? Pleiotropy and connectedness are known to constrain the general rate of protein evolution, prompting some to suggest that the evolution of complex traits, including behaviour, is fuelled by regulatory sequence evolution. However, we seldom have data on the strength of selection on mutations in coding and regulatory sequences, and this hinders our ability to study how pleiotropy influences coding and regulatory sequence evolution. Here we use population genomics to estimate the strength of selection on coding and regulatory mutations for a transcriptional regulatory network that influences complex behaviour of honey bees. We found that replacement mutations in highly connected transcription factors and target genes experience significantly stronger negative selection relative to weakly connected transcription factors and targets. Adaptively evolving proteins were significantly more likely to reside at the periphery of the regulatory network, while proteins with signs of negative selection were near the core of the network. Interestingly, connectedness and network structure had minimal influence on the strength of selection on putative regulatory sequences for both transcription factors and their targets. Our study indicates that adaptive evolution of complex behaviour can arise because of positive selection on protein-coding mutations in peripheral genes, and on regulatory sequence mutations in both transcription factors and their targets throughout the network.

  1. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry

    Directory of Open Access Journals (Sweden)

    Javier Villacreses

    2015-04-01

    Full Text Available Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1. High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs: ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV, Petuvirus genus. ORF1 encodes a movement protein (MP; ORF2 a Reverse Transcriptase (RT and a Ribonuclease H (RNase H domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs, AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq. Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant.

  2. Transcription profiling of the model cyanobacterium Synechococcus sp. strain PCC 7002 by NextGen (SOLiD™ Sequencing of cDNA

    Directory of Open Access Journals (Sweden)

    Marcus eLudwig

    2011-03-01

    Full Text Available The genome of the unicellular, euryhaline cyanobacterium Synechococcus sp. PCC 7002 encodes about 3200 proteins. Transcripts were detected for nearly all annotated open reading frames by a global transcriptomic analysis by Next-Generation (SOLiDTM sequencing of cDNA. In the cDNA samples sequenced, ~90% of the mapped sequences were derived from the 16S and 23S ribosomal RNAs and ~10% of the sequences were derived from mRNAs. In cells grown photoautotrophically under standard conditions (38 °C, 1% (v/v CO2 in air, 250 µmol photons m-2 s-1, the highest transcript levels (up to 2% of the total mRNA for the most abundantly transcribed genes (e. g., cpcAB, psbA, psaA were generally derived from genes encoding structural components of the photosynthetic apparatus. High light exposure for one hour caused changes in transcript levels for genes encoding proteins of the photosynthetic apparatus, Type-1 NADH dehydrogenase complex and ATP synthase, whereas dark incubation for one hour resulted in a global decrease in transcript levels for photosynthesis-related genes and an increase in transcript levels for genes involved in carbohydrate degradation. Transcript levels for pyruvate kinase and the pyruvate dehydrogenase complex decreased sharply in cells incubated in the dark. Under dark anoxic (fermentative conditions, transcript changes indicated a global decrease in transcripts for respiratory proteins and suggested that cells employ an alternative phosphoenolpyruvate degradation pathway via phosphoenolpyruvate synthase (ppsA and the pyruvate:ferredoxin oxidoreductase (nifJ. Finally, the data suggested that an apparent operon involved in tetrapyrrole biosynthesis and fatty acid desaturation, acsF2-ho2-hemN2-desF, may be regulated by oxygen concentration.

  3. Comparison of pause predictions of two sequence-dependent transcription models

    International Nuclear Information System (INIS)

    Bai, Lu; Wang, Michelle D

    2010-01-01

    Two recent theoretical models, Bai et al (2004, 2007) and Tadigotla et al (2006), formulated thermodynamic explanations of sequence-dependent transcription pausing by RNA polymerase (RNAP). The two models differ in some basic assumptions and therefore make different yet overlapping predictions for pause locations, and different predictions on pause kinetics and mechanisms. Here we present a comprehensive comparison of the two models. We show that while they have comparable predictive power of pause locations at low NTP concentrations, the Bai et al model is more accurate than Tadigotla et al at higher NTP concentrations. The pausing kinetics predicted by Bai et al is also consistent with time-course transcription reactions, while Tadigotla et al is unsuited for this type of kinetic prediction. More importantly, the two models in general predict different pausing mechanisms even for the same pausing sites, and the Bai et al model provides an explanation more consistent with recent single molecule observations

  4. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  5. Transcription arrest by a G quadruplex forming-trinucleotide repeat sequence from the human c-myb gene.

    Science.gov (United States)

    Broxson, Christopher; Beckett, Joshua; Tornaletti, Silvia

    2011-05-17

    Non canonical DNA structures correspond to genomic regions particularly susceptible to genetic instability. The transcription process facilitates formation of these structures and plays a major role in generating the instability associated with these genomic sites. However, little is known about how non canonical structures are processed when encountered by an elongating RNA polymerase. Here we have studied the behavior of T7 RNA polymerase (T7RNAP) when encountering a G quadruplex forming-(GGA)(4) repeat located in the human c-myb proto-oncogene. To make direct correlations between formation of the structure and effects on transcription, we have taken advantage of the ability of the T7 polymerase to transcribe single-stranded substrates and of G4 DNA to form in single-stranded G-rich sequences in the presence of potassium ions. Under physiological KCl concentrations, we found that T7 RNAP transcription was arrested at two sites that mapped to the c-myb (GGA)(4) repeat sequence. The extent of arrest did not change with time, indicating that the c-myb repeat represented an absolute block and not a transient pause to T7 RNAP. Consistent with G4 DNA formation, arrest was not observed in the absence of KCl or in the presence of LiCl. Furthermore, mutations in the c-myb (GGA)(4) repeat, expected to prevent transition to G4, also eliminated the transcription block. We show T7 RNAP arrest at the c-myb repeat in double-stranded DNA under conditions mimicking the cellular concentration of biomolecules and potassium ions, suggesting that the G4 structure formed in the c-myb repeat may represent a transcription roadblock in vivo. Our results support a mechanism of transcription-coupled DNA repair initiated by arrest of transcription at G4 structures.

  6. Phyloscan: locating transcription-regulating binding sites in mixed aligned and unaligned sequence data.

    Science.gov (United States)

    Palumbo, Michael J; Newberg, Lee A

    2010-07-01

    The transcription of a gene from its DNA template into an mRNA molecule is the first, and most heavily regulated, step in gene expression. Especially in bacteria, regulation is typically achieved via the binding of a transcription factor (protein) or small RNA molecule to the chromosomal region upstream of a regulated gene. The protein or RNA molecule recognizes a short, approximately conserved sequence within a gene's promoter region and, by binding to it, either enhances or represses expression of the nearby gene. Since the sought-for motif (pattern) is short and accommodating to variation, computational approaches that scan for binding sites have trouble distinguishing functional sites from look-alikes. Many computational approaches are unable to find the majority of experimentally verified binding sites without also finding many false positives. Phyloscan overcomes this difficulty by exploiting two key features of functional binding sites: (i) these sites are typically more conserved evolutionarily than are non-functional DNA sequences; and (ii) these sites often occur two or more times in the promoter region of a regulated gene. The website is free and open to all users, and there is no login requirement. Address: (http://bayesweb.wadsworth.org/phyloscan/).

  7. Enriching Genomic Resources and Marker Development from Transcript Sequences of Jatropha curcas for Microgravity Studies

    Science.gov (United States)

    Tian, Wenlan; Paudel, Dev

    2017-01-01

    Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822

  8. Characterization of the antimicrobial peptide family defensins in the Tasmanian devil (Sarcophilus harrisii), koala (Phascolarctos cinereus), and tammar wallaby (Macropus eugenii).

    Science.gov (United States)

    Jones, Elizabeth A; Cheng, Yuanyuan; O'Meally, Denis; Belov, Katherine

    2017-03-01

    Defensins comprise a family of cysteine-rich antimicrobial peptides with important roles in innate and adaptive immune defense in vertebrates. We characterized alpha and beta defensin genes in three Australian marsupials: the Tasmanian devil (Sarcophilus harrisii), koala (Phascolarctos cinereus), and tammar wallaby (Macropus eugenii) and identified 48, 34, and 39 defensins, respectively. One hundred and twelve have the classical antimicrobial peptides characteristics required for pathogen membrane targeting, including cationic charge (between 1+ and 15+) and a high proportion of hydrophobic residues (>30%). Phylogenetic analysis shows that gene duplication has driven unique and species-specific expansions of devil, koala, and tammar wallaby beta defensins and devil alpha defensins. Defensin genes are arranged in three genomic clusters in marsupials, whereas further duplications and translocations have occurred in eutherians resulting in four and five gene clusters in mice and humans, respectively. Marsupial defensins are generally under purifying selection, particularly residues essential for defensin structural stability. Certain hydrophobic or positively charged sites, predominantly found in the defensin loop, are positively selected, which may have functional significance in defensin-target interaction and membrane insertion.

  9. Massively parallel amplicon sequencing reveals isotype-specific variability of antimicrobial peptide transcripts in Mytilus galloprovincialis.

    Directory of Open Access Journals (Sweden)

    Umberto Rosani

    Full Text Available BACKGROUND: Effective innate responses against potential pathogens are essential in the living world and possibly contributed to the evolutionary success of invertebrates. Taken together, antimicrobial peptide (AMP precursors of defensin, mytilin, myticin and mytimycin can represent about 40% of the hemocyte transcriptome in mussels injected with viral-like and bacterial preparations, and unique profiles of myticin C variants are expressed in single mussels. Based on amplicon pyrosequencing, we have ascertained and compared the natural and Vibrio-induced diversity of AMP transcripts in mussel hemocytes from three European regions. METHODOLOGY/PRINCIPAL FINDINGS: Hemolymph was collected from mussels farmed in the coastal regions of Palavas (France, Vigo (Spain and Venice (Italy. To represent the AMP families known in M. galloprovincialis, nine transcript sequences have been selected, amplified from hemocyte RNA and subjected to pyrosequencing. Hemolymph from farmed (offshore and wild (lagoon Venice mussels, both injected with 10(7 Vibrio cells, were similarly processed. Amplicon pyrosequencing emphasized the AMP transcript diversity, with Single Nucleotide Changes (SNC minimal for mytilin B/C and maximal for arthropod-like defensin and myticin C. Ratio of non-synonymous vs. synonymous changes also greatly differed between AMP isotypes. Overall, each amplicon revealed similar levels of nucleotidic variation across geographical regions, with two main sequence patterns confirmed for mytimycin and no substantial changes after immunostimulation. CONCLUSIONS/SIGNIFICANCE: Barcoding and bidirectional pyrosequencing allowed us to map and compare the transcript diversity of known mussel AMPs. Though most of the genuine cds variation was common to the analyzed samples we could estimate from 9 to 106 peptide variants in hemolymph pools representing 100 mussels, depending on the AMP isoform and sampling site. In this study, no prevailing SNC patterns related

  10. Exploring the sequence-function relationship in transcriptional regulation by the lac O1 operator.

    Science.gov (United States)

    Maity, Tuhin S; Jha, Ramesh K; Strauss, Charlie E M; Dunbar, John

    2012-07-01

    Understanding how binding of a transcription factor to an operator is influenced by the operator sequence is an ongoing quest. It facilitates discovery of alternative binding sites as well as tuning of transcriptional regulation. We investigated the behavior of the Escherichia coli Lac repressor (LacI) protein with a large set of lac O(1) operator variants. The 114 variants examined contained a mean of 2.9 (range 0-4) mutations at positions -4, -2, +2 and +4 in the minimally required 17 bp operator. The relative affinity of LacI for the operators was examined by quantifying expression of a GFP reporter gene and Rosetta structural modeling. The combinations of mutations in the operator sequence created a wide range of regulatory behaviors. We observed variations in the GFP fluorescent signal among the operator variants of more than an order of magnitude under both uninduced and induced conditions. We found that a single nucleotide change may result in changes of up to six- and 12-fold in uninduced and induced GFP signals, respectively. Among the four positions mutated, we found that nucleotide G at position -4 is strongly correlated with strong repression. By Rosetta modeling, we found a significant correlation between the calculated binding energy and the experimentally observed transcriptional repression strength for many operators. However, exceptions were also observed, underscoring the necessity for further improvement in biophysical models of protein-DNA interactions. © 2012 The Authors Journal compilation © 2012 FEBS.

  11. Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.

    Science.gov (United States)

    Ragan, Chikako; Mowry, Bryan J; Bauer, Denis C

    2012-09-01

    Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.

  12. De novo transcriptome sequence assembly and identification of AP2/ERF transcription factor related to abiotic stress in parsley (Petroselinum crispum.

    Directory of Open Access Journals (Sweden)

    Meng-Yao Li

    Full Text Available Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.

  13. De novo transcriptome sequence assembly and identification of AP2/ERF transcription factor related to abiotic stress in parsley (Petroselinum crispum).

    Science.gov (United States)

    Li, Meng-Yao; Tan, Hua-Wei; Wang, Feng; Jiang, Qian; Xu, Zhi-Sheng; Tian, Chang; Xiong, Ai-Sheng

    2014-01-01

    Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.

  14. Theoretical evaluation of transcriptional pausing effect on the attenuation in trp leader sequence

    OpenAIRE

    Suzuki, H.; Kunisawa, T.; Otsuka, J.

    1986-01-01

    The effect of transcriptional pausing on attenuation is investigated theoretically on the basis of the attenuation control mechanism presented by Oxender et al. (Oxender, D. L., G. Zurawski, and C. Yanofsky, 1979, Proc. Natl. Acad. Sci. USA. 76:5524-5528). An extended stochastic model including the RNA polymerase pausing in the leader region is developed to calculate the probability of relative position between the RNA polymerase transcribing the trp leader sequence and the ribosome translati...

  15. GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts.

    Science.gov (United States)

    Naito, Yuki; Bono, Hidemasa

    2012-07-01

    GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users.

  16. Transcriptional profiling of endocrine cerebro-osteodysplasia using microarray and next-generation sequencing.

    Directory of Open Access Journals (Sweden)

    Piya Lahiry

    Full Text Available BACKGROUND: Transcriptome profiling of patterns of RNA expression is a powerful approach to identify networks of genes that play a role in disease. To date, most mRNA profiling of tissues has been accomplished using microarrays, but next-generation sequencing can offer a richer and more comprehensive picture. METHODOLOGY/PRINCIPAL FINDINGS: ECO is a rare multi-system developmental disorder caused by a homozygous mutation in ICK encoding intestinal cell kinase. We performed gene expression profiling using both cDNA microarrays and next-generation mRNA sequencing (mRNA-seq of skin fibroblasts from ECO-affected subjects. We then validated a subset of differentially expressed transcripts identified by each method using quantitative reverse transcription-polymerase chain reaction (qRT-PCR. Finally, we used gene ontology (GO to identify critical pathways and processes that were abnormal according to each technical platform. Methodologically, mRNA-seq identifies a much larger number of differentially expressed genes with much better correlation to qRT-PCR results than the microarray (r² = 0.794 and 0.137, respectively. Biologically, cDNA microarray identified functional pathways focused on anatomical structure and development, while the mRNA-seq platform identified a higher proportion of genes involved in cell division and DNA replication pathways. CONCLUSIONS/SIGNIFICANCE: Transcriptome profiling with mRNA-seq had greater sensitivity, range and accuracy than the microarray. The two platforms generated different but complementary hypotheses for further evaluation.

  17. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus

  18. Transcription factor IID in the Archaea: sequences in the Thermococcus celer genome would encode a product closely related to the TATA-binding protein of eukaryotes

    Science.gov (United States)

    Marsh, T. L.; Reich, C. I.; Whitelock, R. B.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1994-01-01

    The first step in transcription initiation in eukaryotes is mediated by the TATA-binding protein, a subunit of the transcription factor IID complex. We have cloned and sequenced the gene for a presumptive homolog of this eukaryotic protein from Thermococcus celer, a member of the Archaea (formerly archaebacteria). The protein encoded by the archaeal gene is a tandem repeat of a conserved domain, corresponding to the repeated domain in its eukaryotic counterparts. Molecular phylogenetic analyses of the two halves of the repeat are consistent with the duplication occurring before the divergence of the archael and eukaryotic domains. In conjunction with previous observations of similarity in RNA polymerase subunit composition and sequences and the finding of a transcription factor IIB-like sequence in Pyrococcus woesei (a relative of T. celer) it appears that major features of the eukaryotic transcription apparatus were well-established before the origin of eukaryotic cellular organization. The divergence between the two halves of the archael protein is less than that between the halves of the individual eukaryotic sequences, indicating that the average rate of sequence change in the archael protein has been less than in its eukaryotic counterparts. To the extent that this lower rate applies to the genome as a whole, a clearer picture of the early genes (and gene families) that gave rise to present-day genomes is more apt to emerge from the study of sequences from the Archaea than from the corresponding sequences from eukaryotes.

  19. InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Konstantin Okonechnikov

    Full Text Available Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from http:/bitbucket.org/kokonech/infusion.

  20. Nucleotide sequence, transcript mapping, and regulation of the RAD2 gene of Saccharomyces cerevisiae

    International Nuclear Information System (INIS)

    Madura, K.; Prakash, S.

    1986-01-01

    The authors determined the nucleotide sequence, mapped the 5' and 3' nRNA termini, and examined the regulation of the RAD2 gene of Saccharomyces cerevisiae. A long open reading frame within the RAD2 transcribed region encodes a protein of 1031 amino acids with a calculated molecular weight of 117,847. A disruption of the RAD2 gene that deletes the 78 carboxyl terminal codons results in loss of RAD2 function. The 5' ends of RAD2 mRNA show considerable heterogeneity, mapping 5 to 62 nucleotides upstream of the first ATG codon of the long RAD2 open reading frame. The longest RAD2 transcripts also contain a short open reading frame of 37 codons that precedes and overlaps the 5' end of the long RAD2 open reading frame. The RAD2 3' nRNA end maps 171 nucleotides downstream of the TAA termination codon and 20 nucleotides downstream from a 12-base-pair inverted repeat that might function in transcript termination. Northern blot analysis showed a ninefold increase in steady-state levels of RAD2 mRNA after treatment of yeast cells with UV light. The 5' flanking region of the RAD2 gene contains several direct and inverted repeats and a 44-nuclotide-long purine-rich tract. The sequence T G G A G G C A T T A A found at position - 167 to -156 in the RAD2 gene is similar to at sequence present in the 5' flanking regions of the RAD7 and RAD10 genes

  1. Versatile Gene-Specific Sequence Tags for Arabidopsis Functional Genomics: Transcript Profiling and Reverse Genetics Applications

    Science.gov (United States)

    Hilson, Pierre; Allemeersch, Joke; Altmann, Thomas; Aubourg, Sébastien; Avon, Alexandra; Beynon, Jim; Bhalerao, Rishikesh P.; Bitton, Frédérique; Caboche, Michel; Cannoot, Bernard; Chardakov, Vasil; Cognet-Holliger, Cécile; Colot, Vincent; Crowe, Mark; Darimont, Caroline; Durinck, Steffen; Eickhoff, Holger; de Longevialle, Andéol Falcon; Farmer, Edward E.; Grant, Murray; Kuiper, Martin T.R.; Lehrach, Hans; Léon, Céline; Leyva, Antonio; Lundeberg, Joakim; Lurin, Claire; Moreau, Yves; Nietfeld, Wilfried; Paz-Ares, Javier; Reymond, Philippe; Rouzé, Pierre; Sandberg, Goran; Segura, Maria Dolores; Serizet, Carine; Tabrett, Alexandra; Taconnat, Ludivine; Thareau, Vincent; Van Hummelen, Paul; Vercruysse, Steven; Vuylsteke, Marnik; Weingartner, Magdalena; Weisbeek, Peter J.; Wirta, Valtteri; Wittink, Floyd R.A.; Zabeau, Marc; Small, Ian

    2004-01-01

    Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics. PMID:15489341

  2. DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

    Science.gov (United States)

    Ma, Wenxiu; Yang, Lin; Rohs, Remo; Noble, William Stafford

    2017-10-01

    Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values. The software is available at https://bitbucket.org/wenxiu/sequence-shape.git. rohs@usc.edu or william-noble@uw.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  3. Intrinsic terminators in Mycoplasma hyopneumoniae transcription.

    Science.gov (United States)

    Fritsch, Tiago Ebert; Siqueira, Franciele Maboni; Schrank, Irene Silveira

    2015-04-08

    Mycoplasma hyopneumoniae, an important pathogen of swine, exhibits a low guanine and cytosine (GC) content genome. M. hyopneumoniae genome is organised in long transcriptional units and promoter sequences have been mapped upstream of all transcription units. These analysis provided insights into the gene organisation and transcription initiation at the genome scale. However, the presence of transcriptional terminator sequences in the M. hyopneumoniae genome is poorly understood. In silico analyses demonstrated the presence of putative terminators in 82% of the 33 monocistronic units (mCs) and in 74% of the 116 polycistronic units (pCs) considering different classes of terminators. The functional activity of 23 intrinsic terminators was confirmed by RT-PCR and qPCR. Analysis of all terminators found by three software algorithms, combined with experimental results, allowed us to propose a pattern of RNA hairpin formation during the termination process and to predict the location of terminators in the M. hyopneumoniae genome sequence. The stem-loop structures of intrinsic terminators of mycoplasma diverge from the pattern of terminators found in other bacteria due the low content of guanine and cytosine. In M. hyopneumoniae, transcription can end after a transcriptional unit and before its terminator sequence and can also continue past the terminator sequence with RNA polymerases gradually releasing the RNA.

  4. Finite-size effects in transcript sequencing count distribution: its power-law correction necessarily precedes downstream normalization and comparative analysis.

    Science.gov (United States)

    Wong, Wing-Cheong; Ng, Hong-Kiat; Tantoso, Erwin; Soong, Richie; Eisenhaber, Frank

    2018-02-12

    Though earlier works on modelling transcript abundance from vertebrates to lower eukaroytes have specifically singled out the Zip's law, the observed distributions often deviate from a single power-law slope. In hindsight, while power-laws of critical phenomena are derived asymptotically under the conditions of infinite observations, real world observations are finite where the finite-size effects will set in to force a power-law distribution into an exponential decay and consequently, manifests as a curvature (i.e., varying exponent values) in a log-log plot. If transcript abundance is truly power-law distributed, the varying exponent signifies changing mathematical moments (e.g., mean, variance) and creates heteroskedasticity which compromises statistical rigor in analysis. The impact of this deviation from the asymptotic power-law on sequencing count data has never truly been examined and quantified. The anecdotal description of transcript abundance being almost Zipf's law-like distributed can be conceptualized as the imperfect mathematical rendition of the Pareto power-law distribution when subjected to the finite-size effects in the real world; This is regardless of the advancement in sequencing technology since sampling is finite in practice. Our conceptualization agrees well with our empirical analysis of two modern day NGS (Next-generation sequencing) datasets: an in-house generated dilution miRNA study of two gastric cancer cell lines (NUGC3 and AGS) and a publicly available spike-in miRNA data; Firstly, the finite-size effects causes the deviations of sequencing count data from Zipf's law and issues of reproducibility in sequencing experiments. Secondly, it manifests as heteroskedasticity among experimental replicates to bring about statistical woes. Surprisingly, a straightforward power-law correction that restores the distribution distortion to a single exponent value can dramatically reduce data heteroskedasticity to invoke an instant increase in

  5. Next-Generation Sequencing of Genomic DNA Fragments Bound to a Transcription Factor in Vitro Reveals Its Regulatory Potential

    Directory of Open Access Journals (Sweden)

    Yukio Kurihara

    2014-12-01

    Full Text Available Several transcription factors (TFs coordinate to regulate expression of specific genes at the transcriptional level. In Arabidopsis thaliana it is estimated that approximately 10% of all genes encode TFs or TF-like proteins. It is important to identify target genes that are directly regulated by TFs in order to understand the complete picture of a plant’s transcriptome profile. Here, we investigate the role of the LONG HYPOCOTYL5 (HY5 transcription factor that acts as a regulator of photomorphogenesis. We used an in vitro genomic DNA binding assay coupled with immunoprecipitation and next-generation sequencing (gDB-seq instead of the in vivo chromatin immunoprecipitation (ChIP-based methods. The results demonstrate that the HY5-binding motif predicted here was similar to the motif reported previously and that in vitro HY5-binding loci largely overlapped with the HY5-targeted candidate genes identified in previous ChIP-chip analysis. By combining these results with microarray analysis, we identified hundreds of HY5-binding genes that were differentially expressed in hy5. We also observed delayed induction of some transcripts of HY5-binding genes in hy5 mutants in response to blue-light exposure after dark treatment. Thus, an in vitro gDNA-binding assay coupled with sequencing is a convenient and powerful method to bridge the gap between identifying TF binding potential and establishing function.

  6. “Jump Start and Gain” Model for Dosage Compensation in Drosophila Based on Direct Sequencing of Nascent Transcripts

    Directory of Open Access Journals (Sweden)

    Francesco Ferrari

    2013-11-01

    Full Text Available Dosage compensation in Drosophila is mediated by the MSL complex, which increases male X-linked gene expression approximately 2-fold. The MSL complex preferentially binds the bodies of active genes on the male X, depositing H4K16ac with a 3′ bias. Two models have been proposed for the influence of the MSL complex on transcription: one based on promoter recruitment of RNA polymerase II (Pol II, and a second featuring enhanced transcriptional elongation. Here, we utilize nascent RNA sequencing to document dosage compensation during transcriptional elongation. We also compare X and autosomes from published data on paused and elongating polymerase in order to assess the role of Pol II recruitment. Our results support a model for differentially regulated elongation, starting with release from 5′ pausing and increasing through X-linked gene bodies. Our results highlight facilitated transcriptional elongation as a key mechanism for the coordinated regulation of a diverse set of genes.

  7. SoyDB: a knowledge database of soybean transcription factors

    Directory of Open Access Journals (Sweden)

    Valliyodan Babu

    2010-01-01

    Full Text Available Abstract Background Transcription factors play the crucial rule of regulating gene expression and influence almost all biological processes. Systematically identifying and annotating transcription factors can greatly aid further understanding their functions and mechanisms. In this article, we present SoyDB, a user friendly database containing comprehensive knowledge of soybean transcription factors. Description The soybean genome was recently sequenced by the Department of Energy-Joint Genome Institute (DOE-JGI and is publicly available. Mining of this sequence identified 5,671 soybean genes as putative transcription factors. These genes were comprehensively annotated as an aid to the soybean research community. We developed SoyDB - a knowledge database for all the transcription factors in the soybean genome. The database contains protein sequences, predicted tertiary structures, putative DNA binding sites, domains, homologous templates in the Protein Data Bank (PDB, protein family classifications, multiple sequence alignments, consensus protein sequence motifs, web logo of each family, and web links to the soybean transcription factor database PlantTFDB, known EST sequences, and other general protein databases including Swiss-Prot, Gene Ontology, KEGG, EMBL, TAIR, InterPro, SMART, PROSITE, NCBI, and Pfam. The database can be accessed via an interactive and convenient web server, which supports full-text search, PSI-BLAST sequence search, database browsing by protein family, and automatic classification of a new protein sequence into one of 64 annotated transcription factor families by hidden Markov models. Conclusions A comprehensive soybean transcription factor database was constructed and made publicly accessible at http://casp.rnet.missouri.edu/soydb/.

  8. Characterization of human mesothelin transcripts in ovarian and pancreatic cancer

    International Nuclear Information System (INIS)

    Muminova, Zhanat E; Strong, Theresa V; Shaw, Denise R

    2004-01-01

    Mesothelin is an attractive target for cancer immunotherapy due to its restricted expression in normal tissues and high level expression in several tumor types including ovarian and pancreatic adenocarcinomas. Three mesothelin transcript variants have been reported, but their relative expression in normal tissues and tumors has been poorly characterized. The goal of the present study was to clarify which mesothelin transcript variants are commonly expressed in human tumors. Human genomic and EST nucleotide sequences in the public databases were used to evaluate sequences reported for the three mesothelin transcript variants in silico. Subsequently, RNA samples from normal ovary, ovarian and pancreatic carcinoma cell lines, and primary ovarian tumors were analyzed by reverse transcription-polymerase chain reaction (RT-PCR) and nucleotide sequencing to directly identify expressed transcripts. In silico comparisons of genomic DNA sequences with available EST sequences supported expression of mesothelin transcript variants 1 and 3, but there were no sequence matches for transcript variant 2. Newly-derived nucleotide sequences of RT-PCR products from tissues and cell lines corresponded to mesothelin transcript variant 1. Mesothelin transcript variant 2 was not detected. Transcript variant 3 was observed as a small percentage of total mesothelin amplification products from all studied cell lines and tissues. Fractionation of nuclear and cytoplasmic RNA indicated that variant 3 was present primarily in the nuclear fraction. Thus, mesothelin transcript variant 3 may represent incompletely processed hnRNA. Mesothelin transcript variant 1 represents the predominant mature mRNA species expressed by both normal and tumor cells. This conclusion should be important for future development of cancer immunotherapies, diagnostic tests, and gene microarray studies targeting mesothelin

  9. Chromatin immunoprecipitation (ChIP) of plant transcription factors followed by sequencing (ChIP-SEQ) or hybridization to whole genome arrays (ChIP-CHIP)

    NARCIS (Netherlands)

    Kaufmann, K.; Muiño, J.M.; Østerås, M.; Farinelli, L.; Krajewski, P.; Angenent, G.C.

    2010-01-01

    Chromatin immunoprecipitation (ChIP) is a powerful technique to study interactions between transcription factors (TFs) and DNA in vivo. For genome-wide de novo discovery of TF-binding sites, the DNA that is obtained in ChIP experiments needs to be processed for sequence identification. The sequences

  10. The transcriptional landscape

    DEFF Research Database (Denmark)

    Nielsen, Henrik

    2011-01-01

    The application of new and less biased methods to study the transcriptional output from genomes, such as tiling arrays and deep sequencing, has revealed that most of the genome is transcribed and that there is substantial overlap of transcripts derived from the two strands of DNA. In protein coding...... regions, the map of transcripts is very complex due to small transcripts from the flanking ends of the transcription unit, the use of multiple start and stop sites for the main transcript, production of multiple functional RNA molecules from the same primary transcript, and RNA molecules made...... by independent transcription from within the unit. In genomic regions separating those that encode proteins or highly abundant RNA molecules with known function, transcripts are generally of low abundance and short-lived. In most of these cases, it is unclear to what extent a function is related to transcription...

  11. Induction and maintenance of DNA methylation in plant promoter sequences by apple latent spherical virus-induced transcriptional gene silencing

    Directory of Open Access Journals (Sweden)

    Tatsuya eKon

    2014-11-01

    Full Text Available Apple latent spherical virus (ALSV is an efficient virus-induced gene silencing vector in functional genomics analyses of a broad range of plant species. Here, an Agrobacterium-mediated inoculation (agroinoculation system was developed for the ALSV vector, and virus-induced transcriptional gene silencing (VITGS is described in plants infected with the ALSV vector. The cDNAs of ALSV RNA1 and RNA2 were inserted between the CaMV 35S promoter and the NOS-T sequences in a binary vector pCAMBIA1300 to produce pCALSR1 and pCALSR2-XSB or pCALSR2-XSB/MN. When these vector constructs were agroinoculated into Nicotiana benthamiana plants with a construct expressing a viral silencing suppressor, the infection efficiency of the vectors was 100%. A recombinant ALSV vector carrying part of the 35S promoter sequence induced transcriptional gene silencing of the green fluorescent protein gene in a line of N. benthamiana plants, resulting in the disappearance of green fluorescence of infected plants. Bisulfite sequencing showed that cytosine residues at CG and CHG sites of the 35S promoter sequence were highly methylated in the silenced generation 0 plants infected with the ALSV carrying the promoter sequence as well as in progeny. The ALSV-mediated VITGS state was inherited by progeny for multiple generations. In addition, induction of VITGS of an endogenous gene (chalcone synthase-A was demonstrated in petunia plants infected with an ALSV vector carrying the native promoter sequence. These results suggest that ALSV-based vectors can be applied to study DNA methylation in plant genomes, and provide a useful tool for plant breeding via epigenetic modification.

  12. SONAR: A High-Throughput Pipeline for Inferring Antibody Ontogenies from Longitudinal Sequencing of B Cell Transcripts.

    Science.gov (United States)

    Schramm, Chaim A; Sheng, Zizhang; Zhang, Zhenhai; Mascola, John R; Kwong, Peter D; Shapiro, Lawrence

    2016-01-01

    The rapid advance of massively parallel or next-generation sequencing technologies has made possible the characterization of B cell receptor repertoires in ever greater detail, and these developments have triggered a proliferation of software tools for processing and annotating these data. Of especial interest, however, is the capability to track the development of specific antibody lineages across time, which remains beyond the scope of most current programs. We have previously reported on the use of techniques such as inter- and intradonor analysis and CDR3 tracing to identify transcripts related to an antibody of interest. Here, we present Software for the Ontogenic aNalysis of Antibody Repertoires (SONAR), capable of automating both general repertoire analysis and specialized techniques for investigating specific lineages. SONAR annotates next-generation sequencing data, identifies transcripts in a lineage of interest, and tracks lineage development across multiple time points. SONAR also generates figures, such as identity-divergence plots and longitudinal phylogenetic "birthday" trees, and provides interfaces to other programs such as DNAML and BEAST. SONAR can be downloaded as a ready-to-run Docker image or manually installed on a local machine. In the latter case, it can also be configured to take advantage of a high-performance computing cluster for the most computationally intensive steps, if available. In summary, this software provides a useful new tool for the processing of large next-generation sequencing datasets and the ontogenic analysis of neutralizing antibody lineages. SONAR can be found at https://github.com/scharch/SONAR, and the Docker image can be obtained from https://hub.docker.com/r/scharch/sonar/.

  13. Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12.

    Science.gov (United States)

    Thieffry, D; Salgado, H; Huerta, A M; Collado-Vides, J

    1998-06-01

    As one of the best-characterized free-living organisms, Escherichia coli and its recently completed genomic sequence offer a special opportunity to exploit systematically the variety of regulatory data available in the literature in order to make a comprehensive set of regulatory predictions in the whole genome. The complete genome sequence of E.coli was analyzed for the binding of transcriptional regulators upstream of coding sequences. The biological information contained in RegulonDB (Huerta, A.M. et al., Nucleic Acids Res.,26,55-60, 1998) for 56 different transcriptional proteins was the support to implement a stringent strategy combining string search and weight matrices. We estimate that our search included representatives of 15-25% of the total number of regulatory binding proteins in E.coli. This search was performed on the set of 4288 putative regulatory regions, each 450 bp long. Within the regions with predicted sites, 89% are regulated by one protein and 81% involve only one site. These numbers are reasonably consistent with the distribution of experimental regulatory sites. Regulatory sites are found in 603 regions corresponding to 16% of operon regions and 10% of intra-operonic regions. Additional evidence gives stronger support to some of these predictions, including the position of the site, biological consistency with the function of the downstream gene, as well as genetic evidence for the regulatory interaction. The predictions described here were incorporated into the map presented in the paper describing the complete E.coli genome (Blattner,F.R. et al., Science, 277, 1453-1461, 1997). The complete set of predictions in GenBank format is available at the url: http://www. cifn.unam.mx/Computational_Biology/E.coli-predictions ecoli-reg@cifn.unam.mx, collado@cifn.unam.mx

  14. Comparison of Transcription Factor Binding Site Models

    KAUST Repository

    Bhuyan, Sharifulislam

    2012-05-01

    Modeling of transcription factor binding sites (TFBSs) and TFBS prediction on genomic sequences are important steps to elucidate transcription regulatory mechanism. Dependency of transcription regulation on a great number of factors such as chemical specificity, molecular structure, genomic and epigenetic characteristics, long distance interaction, makes this a challenging problem. Different experimental procedures generate evidence that DNA-binding domains of transcription factors show considerable DNA sequence specificity. Probabilistic modeling of TFBSs has been moderately successful in identifying patterns from a family of sequences. In this study, we compare performances of different probabilistic models and try to estimate their efficacy over experimental TFBSs data. We build a pipeline to calculate sensitivity and specificity from aligned TFBS sequences for several probabilistic models, such as Markov chains, hidden Markov models, Bayesian networks. Our work, containing relevant statistics and evaluation for the models, can help researchers to choose the most appropriate model for the problem at hand.

  15. New insights into transcription fidelity: thermal stability of non-canonical structures in template DNA regulates transcriptional arrest, pause, and slippage.

    Science.gov (United States)

    Tateishi-Karimata, Hisae; Isono, Noburu; Sugimoto, Naoki

    2014-01-01

    The thermal stability and topology of non-canonical structures of G-quadruplexes and hairpins in template DNA were investigated, and the effect of non-canonical structures on transcription fidelity was evaluated quantitatively. We designed ten template DNAs: A linear sequence that does not have significant higher-order structure, three sequences that form hairpin structures, and six sequences that form G-quadruplex structures with different stabilities. Templates with non-canonical structures induced the production of an arrested, a slipped, and a full-length transcript, whereas the linear sequence produced only a full-length transcript. The efficiency of production for run-off transcripts (full-length and slipped transcripts) from templates that formed the non-canonical structures was lower than that from the linear. G-quadruplex structures were more effective inhibitors of full-length product formation than were hairpin structure even when the stability of the G-quadruplex in an aqueous solution was the same as that of the hairpin. We considered that intra-polymerase conditions may differentially affect the stability of non-canonical structures. The values of transcription efficiencies of run-off or arrest transcripts were correlated with stabilities of non-canonical structures in the intra-polymerase condition mimicked by 20 wt% polyethylene glycol (PEG). Transcriptional arrest was induced when the stability of the G-quadruplex structure (-ΔG°37) in the presence of 20 wt% PEG was more than 8.2 kcal mol(-1). Thus, values of stability in the presence of 20 wt% PEG are an important indicator of transcription perturbation. Our results further our understanding of the impact of template structure on the transcription process and may guide logical design of transcription-regulating drugs.

  16. Genome-wide identification and characterization of Notch transcription complex-binding sequence paired sites in leukemia cells

    Science.gov (United States)

    Severson, Eric; Arnett, Kelly L.; Wang, Hongfang; Zang, Chongzhi; Taing, Len; Liu, Hudan; Pear, Warren S.; Liu, X. Shirley; Blacklow, Stephen C.; Aster, Jon C.

    2018-01-01

    Notch transcription complexes (NTCs) drive target gene expression by binding to two distinct types of genomic response elements, NTC monomer-binding sites and sequence-paired sites (SPSs) that bind NTC dimers. SPSs are conserved and are linked to the Notch-responsiveness of a few genes, but their overall contribution to Notch-dependent gene regulation is unknown. To address this issue, we determined the DNA sequence requirements for NTC dimerization using a fluorescence resonance energy transfer (FRET) assay, and applied insights from these in vitro studies to Notch-“addicted” leukemia cells. We find that SPSs contribute to the regulation of approximately a third of direct Notch target genes. While originally described in promoters, SPSs are present mainly in long-range enhancers, including an enhancer containing a newly described SPS that regulates HES5. Our work provides a general method for identifying sequence-paired sites in genome-wide data sets and highlights the widespread role of NTC dimerization in Notch-transformed leukemia cells. PMID:28465412

  17. Identification and positional distribution analysis of transcription factor binding sites for genes from the wheat fl-cDNA sequences.

    Science.gov (United States)

    Chen, Zhen-Yong; Guo, Xiao-Jiang; Chen, Zhong-Xu; Chen, Wei-Ying; Wang, Ji-Rui

    2017-06-01

    The binding sites of transcription factors (TFs) in upstream DNA regions are called transcription factor binding sites (TFBSs). TFBSs are important elements for regulating gene expression. To date, there have been few studies on the profiles of TFBSs in plants. In total, 4,873 sequences with 5' upstream regions from 8530 wheat fl-cDNA sequences were used to predict TFBSs. We found 4572 TFBSs for the MADS TF family, which was twice as many as for bHLH (1951), B3 (1951), HB superfamily (1914), ERF (1820), and AP2/ERF (1725) TFs, and was approximately four times higher than the remaining TFBS types. The percentage of TFBSs and TF members showed a distinct distribution in different tissues. Overall, the distribution of TFBSs in the upstream regions of wheat fl-cDNA sequences had significant difference. Meanwhile, high frequencies of some types of TFBSs were found in specific regions in the upstream sequences. Both TFs and fl-cDNA with TFBSs predicted in the same tissues exhibited specific distribution preferences for regulating gene expression. The tissue-specific analysis of TFs and fl-cDNA with TFBSs provides useful information for functional research, and can be used to identify relationships between tissue-specific TFs and fl-cDNA with TFBSs. Moreover, the positional distribution of TFBSs indicates that some types of wheat TFBS have different positional distribution preferences in the upstream regions of genes.

  18. Elucidating MicroRNA Regulatory Networks Using Transcriptional, Post-transcriptional, and Histone Modification Measurements

    Directory of Open Access Journals (Sweden)

    Sara J.C. Gosline

    2016-01-01

    Full Text Available MicroRNAs (miRNAs regulate diverse biological processes by repressing mRNAs, but their modest effects on direct targets, together with their participation in larger regulatory networks, make it challenging to delineate miRNA-mediated effects. Here, we describe an approach to characterizing miRNA-regulatory networks by systematically profiling transcriptional, post-transcriptional and epigenetic activity in a pair of isogenic murine fibroblast cell lines with and without Dicer expression. By RNA sequencing (RNA-seq and CLIP (crosslinking followed by immunoprecipitation sequencing (CLIP-seq, we found that most of the changes induced by global miRNA loss occur at the level of transcription. We then introduced a network modeling approach that integrated these data with epigenetic data to identify specific miRNA-regulated transcription factors that explain the impact of miRNA perturbation on gene expression. In total, we demonstrate that combining multiple genome-wide datasets spanning diverse regulatory modes enables accurate delineation of the downstream miRNA-regulated transcriptional network and establishes a model for studying similar networks in other systems.

  19. SONAR: A high-throughput pipeline for inferring antibody ontogenies from longitudinal sequencing of B cell transcripts

    Directory of Open Access Journals (Sweden)

    Chaim A Schramm

    2016-09-01

    Full Text Available The rapid advance of massively parallel or next-generation sequencing technologies has made possible the characterization of B cell receptor repertoires in ever greater detail, leading to a proliferation of software tools for processing and annotating this data. Of especial interest, however, is the capability to track the development of specific antibody lineages across time, which remains beyond the scope of most current programs. We have previously reported on the use of techniques such as inter- and intra-donor analysis and CDR3 tracing to identify transcripts related to an antibody of interest. Here, we present Software for the Ontogenic aNalysis of Antibody Repertoires (SONAR, capable of automating both general repertoire analysis and specialized techniques for investigating specific lineages. SONAR annotates next-generation sequencing data, identifies transcripts in a lineage of interest, and tracks lineage development across multiple time points. SONAR also generates figures, such as identity-divergence plots and longitudinal phylogenetic birthday trees, and provides interfaces to other programs such as DNAML and BEAST. SONAR can be downloaded as a ready-to-run Docker image or manually installed on a local machine. In the latter case, it can also be configured to take advantage of a high-performance computing cluster for the most computationally intensive steps, if available. In summary, this software provides a useful new tool for the processing of large next-generation sequencing datasets and the ontogenic analysis of neutralizing antibody lineages. SONAR can be found at https://github.com/scharch/SONAR and the Docker image can be obtained from https://hub.docker.com/r/scharch/sonar/.

  20. ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2005-10-01

    Full Text Available Abstract Background: Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems – hence the need to develop novel strategies. Results: We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion. It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. Conclusion: Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at http://aspic.algo.disco.unimib.it/aspic-devel/.

  1. 5' diversity of human hepatic PXR (NR1I2) transcripts and identification of the major transcription initiation site.

    Science.gov (United States)

    Kurose, Kouichi; Koyano, Satoru; Ikeda, Shinobu; Tohkin, Masahiro; Hasegawa, Ryuichi; Sawada, Jun-Ichi

    2005-05-01

    The human pregnane X receptor (PXR) is a crucial regulator of the genes encoding several major cytochrome P450 enzymes and transporters, such as CYP3A4 and MDR1, but its own transcriptional regulation remains unclear. To elucidate the transcriptional mechanisms of human PXR gene, we first endeavored to identify the transcription initiation site of human PXR using 5'-RACE. Five types of 5'-variable transcripts (a, b, c, d, and e) with common exon 2 sequence were found, and comparison of these sequences with the genomic sequence suggested that their 5' diversity is derived from initiation by alternative promoters and alternative splicing. None of the exons found in our study contain any new in-frame coding regions. Newly identified introns IVS-a and IVS-b were found to have CT-AC splice sites that do not follow the GT-AG rule of conventional donor and acceptor splice sites. Of the five types of 5' variable transcripts identified, RT-PCR showed that type-a was the major transcript type. Four transcription initiation sites (A-D) for type-a transcript were identified by 5'-RACE using GeneRacer RACE Ready cDNA (human liver) constructed by the oligo-capping method. Putative TATA boxes were located approximately 30 bp upstream from the transcriptional start sites of the major transcript (C) and the longest minor transcript (A) expressed in the human liver. These results indicate that the initiation of transcription of human PXR is more complex than previously reported.

  2. Full-Length Sequence of Mouse Acupuncture-Induced 1-L (Aig1l Gene Including Its Transcriptional Start Site

    Directory of Open Access Journals (Sweden)

    Mika Ohta

    2011-01-01

    Full Text Available We have been investigating the molecular efficacy of electroacupuncture (EA, which is one type of acupuncture therapy. In our previous molecular biological study of acupuncture, we found an EA-induced gene, named acupuncture-induced 1-L (Aig1l, in mouse skeletal muscle. The aims of this study consisted of identification of the full-length cDNA sequence of Aig1l including the transcriptional start site, determination of the tissue distribution of Aig1l and analysis of the effect of EA on Aig1l gene expression. We determined the complete cDNA sequence including the transcriptional start site via cDNA cloning with the cap site hunting method. We then analyzed the tissue distribution of Aig1l by means of northern blot analysis and real-time quantitative polymerase chain reaction. We used the semiquantitative reverse transcriptase-polymerase chain reaction to examine the effect of EA on Aig1l gene expression. Our results showed that the complete cDNA sequence of Aig1l was 6073 bp long, and the putative protein consisted of 962 amino acids. All seven tissues that we analyzed expressed the Aig1l gene. In skeletal muscle, EA induced expression of the Aig1l gene, with high expression observed after 3 hours of EA. Our findings thus suggest that the Aig1l gene may play a key role in the molecular mechanisms of EA efficacy.

  3. Transcriptional control in Alicyclobacillus acidocaldarius and associated genes, proteins, and methods

    Science.gov (United States)

    Lee, Brady Deneys; Thompson, David N; Apel, William A.; Thompson, Vicki Slavchev; Reed, David W; Lacey, Jeffrey A

    2014-05-06

    Isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius are provided. Further provided are methods of modulating transcription or transcription or transcriptional control using isolated and/or purified polypeptides and nucleic acid sequences from Alicyclobacillus acidocaldarius.

  4. Transcriptional control in alicyclobacillus acidocaldarius and associated genes, proteins, and methods

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Brady D; Thompson, David N; Apel, William A; Thompson, Vicki S; Reed, David W; Lacey, Jeffrey A

    2016-11-22

    Isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius are provided. Further provided are methods of modulating transcription or transcription or transcriptional control using isolated and/or purified polypeptides and nucleic acid sequences from Alicyclobacillus acidocaldarius.

  5. A powerful method for transcriptional profiling of specific cell types in eukaryotes: laser-assisted microdissection and RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Marc W Schmid

    Full Text Available The acquisition of distinct cell fates is central to the development of multicellular organisms and is largely mediated by gene expression patterns specific to individual cells and tissues. A spatially and temporally resolved analysis of gene expression facilitates the elucidation of transcriptional networks linked to cellular identity and function. We present an approach that allows cell type-specific transcriptional profiling of distinct target cells, which are rare and difficult to access, with unprecedented sensitivity and resolution. We combined laser-assisted microdissection (LAM, linear amplification starting from <1 ng of total RNA, and RNA-sequencing (RNA-Seq. As a model we used the central cell of the Arabidopsis thaliana female gametophyte, one of the female gametes harbored in the reproductive organs of the flower. We estimated the number of expressed genes to be more than twice the number reported previously in a study using LAM and ATH1 microarrays, and identified several classes of genes that were systematically underrepresented in the transcriptome measured with the ATH1 microarray. Among them are many genes that are likely to be important for developmental processes and specific cellular functions. In addition, we identified several intergenic regions, which are likely to be transcribed, and describe a considerable fraction of reads mapping to introns and regions flanking annotated loci, which may represent alternative transcript isoforms. Finally, we performed a de novo assembly of the transcriptome and show that the method is suitable for studying individual cell types of organisms lacking reference sequence information, demonstrating that this approach can be applied to most eukaryotic organisms.

  6. Effect of chronic uremia on the transcriptional profile of the calcified aorta analyzed by RNA sequencing

    DEFF Research Database (Denmark)

    Rukov, Jakob Lewin; Gravesen, Eva; Mace, Maria L.

    2016-01-01

    The development of vascular calcification (VC) in chronic uremia (CU) is a tightly regulated process controlled by factors promoting and inhibiting mineralization. Next-generation high-throughput RNA sequencing (RNA-seq) is a powerful and sensitive tool for quantitative gene expression profiling...... with an expression level of >1 reads/kilobase transcript/million mapped reads, 2,663 genes were differentially expressed with 47% upregulated genes and 53% downregulated genes in uremic rats. Significantly deregulated genes were enriched for ontologies related to the extracellular matrix, response to wounding...

  7. Transcription arrest caused by long nascent RNA chains

    DEFF Research Database (Denmark)

    Bentin, Thomas; Cherny, Dmitry; Larsen, H Jakob

    2004-01-01

    on transcription. Using phage T3 RNA polymerase (T3 RNAP) and covalently closed circular (cccDNA) DNA templates that did not contain any strong termination signal, transcription was severely inhibited after a short period of time. Less than approximately 10% residual transcriptional activity remained after 10 min......The transcription process is highly processive. However, specific sequence elements encoded in the nascent RNA may signal transcription pausing and/or termination. We find that under certain conditions nascent RNA chains can have a strong and apparently sequence-independent inhibitory effect...... of incubation. The addition of RNase A almost fully restored transcription in a dose dependent manner. Throughout RNase A rescue, an elongation rate of approximately 170 nt/s was maintained and this velocity was independent of RNA transcript length, at least up to 6 kb. Instead, RNase A rescue increased...

  8. DNA damage-inducible transcripts in mammalian cells

    International Nuclear Information System (INIS)

    Fornace, A.J. Jr.; Alamo, I. Jr.; Hollander, M.C.

    1988-01-01

    Hybridization subtraction at low ratios of RNA to cDNA was used to enrich for the cDNA of transcripts increased in Chinese hamster cells after UV irradiation. Forty-nine different cDNA clones were isolated. Most coded for nonabundant transcripts rapidly induced 2- to 10-fold after UV irradiation. Only 2 of the 20 cDNA clones sequenced matched known sequences (metallothionein I and II). The predicted amino acid sequence of one cDNA had two localized areas of homology with the rat helix-destabilizing protein. These areas of homology were at the two DNA-binding sites of this nucleic acid single-strand-binding protein. The induced transcripts were separated into two general classes. Class I transcripts were induced by UV radiation and not by the alkylating agent methyl methanesulfonate. Class II transcripts were induced by UV radiation and by methyl methanesulfonate. Many class II transcripts were induced also by H2O2 and various alkylating agents but not by heat shock, phorbol 12-tetradecanoate 13-acetate, or DNA-damaging agents which do not produce high levels of base damage. Since many of the cDNA clones coded for transcripts which were induced rapidly and only by certain types of DNA-damaging agents, their induction is likely a specific response to such damage rather than a general response to cell injury

  9. Mapping the transcription start points of the Staphylococcus aureus eap, emp, and vwb promoters reveals a conserved octanucleotide sequence that is essential for expression of these genes.

    Science.gov (United States)

    Harraghy, Niamh; Homerova, Dagmar; Herrmann, Mathias; Kormanec, Jan

    2008-01-01

    Mapping the transcription start points of the eap, emp, and vwb promoters revealed a conserved octanucleotide sequence (COS). Deleting this sequence abolished the expression of eap, emp, and vwb. However, electrophoretic mobility shift assays gave no evidence that this sequence was a binding site for SarA or SaeR, known regulators of eap and emp.

  10. Deep RNA sequencing reveals hidden features and dynamics of early gene transcription in Paramecium bursaria chlorella virus 1.

    Directory of Open Access Journals (Sweden)

    Guillaume Blanc

    Full Text Available Paramecium bursaria chlorella virus 1 (PBCV-1 is the prototype of the genus Chlorovirus (family Phycodnaviridae that infects the unicellular, eukaryotic green alga Chlorella variabilis NC64A. The 331-kb PBCV-1 genome contains 416 major open reading frames. A mRNA-seq approach was used to analyze PBCV-1 transcriptomes at 6 progressive times during the first hour of infection. The alignment of 17 million reads to the PBCV-1 genome allowed the construction of single-base transcriptome maps. Significant transcription was detected for a subset of 50 viral genes as soon as 7 min after infection. By 20 min post infection (p.i., transcripts were detected for most PBCV-1 genes and transcript levels continued to increase globally up to 60 min p.i., at which time 41% or the poly (A+-containing RNAs in the infected cells mapped to the PBCV-1 genome. For some viral genes, the number of transcripts in the latter time points (20 to 60 min p.i. was much higher than that of the most highly expressed host genes. RNA-seq data revealed putative polyadenylation signal sequences in PBCV-1 genes that were identical to the polyadenylation signal AAUAAA of green algae. Several transcripts have an RNA fragment excised. However, the frequency of excision and the resulting putative shortened protein products suggest that most of these excision events have no functional role but are probably the result of the activity of misled splicesomes.

  11. Demonstrating Interactions of Transcription Factors with DNA by Electrophoretic Mobility Shift Assay.

    Science.gov (United States)

    Yousaf, Nasim; Gould, David

    2017-01-01

    Confirming the binding of a transcription factor with a particular DNA sequence may be important in characterizing interactions with a synthetic promoter. Electrophoretic mobility shift assay is a powerful approach to demonstrate the specific DNA sequence that is bound by a transcription factor and also to confirm the specific transcription factor involved in the interaction. In this chapter we describe a method we have successfully used to demonstrate interactions of endogenous transcription factors with sequences derived from endogenous and synthetic promoters.

  12. Metagenomic screening for aromatic compound-responsive transcriptional regulators.

    Directory of Open Access Journals (Sweden)

    Taku Uchiyama

    Full Text Available We applied a metagenomics approach to screen for transcriptional regulators that sense aromatic compounds. The library was constructed by cloning environmental DNA fragments into a promoter-less vector containing green fluorescence protein. Fluorescence-based screening was then performed in the presence of various aromatic compounds. A total of 12 clones were isolated that fluoresced in response to salicylate, 3-methyl catechol, 4-chlorocatechol and chlorohydroquinone. Sequence analysis revealed at least 1 putative transcriptional regulator, excluding 1 clone (CHLO8F. Deletion analysis identified compound-specific transcriptional regulators; namely, 8 LysR-types, 2 two-component-types and 1 AraC-type. Of these, 9 representative clones were selected and their reaction specificities to 18 aromatic compounds were investigated. Overall, our transcriptional regulators were functionally diverse in terms of both specificity and induction rates. LysR- and AraC- type regulators had relatively narrow specificities with high induction rates (5-50 fold, whereas two-component-types had wide specificities with low induction rates (3 fold. Numerous transcriptional regulators have been deposited in sequence databases, but their functions remain largely unknown. Thus, our results add valuable information regarding the sequence-function relationship of transcriptional regulators.

  13. Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Amy L Bauer

    2010-11-01

    Full Text Available An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF. Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.

  14. RNA-Sequencing Reveals Unique Transcriptional Signatures of Running and Running-Independent Environmental Enrichment in the Adult Mouse Dentate Gyrus.

    Science.gov (United States)

    Grégoire, Catherine-Alexandra; Tobin, Stephanie; Goldenstein, Brianna L; Samarut, Éric; Leclerc, Andréanne; Aumont, Anne; Drapeau, Pierre; Fulton, Stephanie; Fernandes, Karl J L

    2018-01-01

    Environmental enrichment (EE) is a powerful stimulus of brain plasticity and is among the most accessible treatment options for brain disease. In rodents, EE is modeled using multi-factorial environments that include running, social interactions, and/or complex surroundings. Here, we show that running and running-independent EE differentially affect the hippocampal dentate gyrus (DG), a brain region critical for learning and memory. Outbred male CD1 mice housed individually with a voluntary running disk showed improved spatial memory in the radial arm maze compared to individually- or socially-housed mice with a locked disk. We therefore used RNA sequencing to perform an unbiased interrogation of DG gene expression in mice exposed to either a voluntary running disk (RUN), a locked disk (LD), or a locked disk plus social enrichment and tunnels [i.e., a running-independent complex environment (CE)]. RNA sequencing revealed that RUN and CE mice showed distinct, non-overlapping patterns of transcriptomic changes versus the LD control. Bio-informatics uncovered that the RUN and CE environments modulate separate transcriptional networks, biological processes, cellular compartments and molecular pathways, with RUN preferentially regulating synaptic and growth-related pathways and CE altering extracellular matrix-related functions. Within the RUN group, high-distance runners also showed selective stress pathway alterations that correlated with a drastic decline in overall transcriptional changes, suggesting that excess running causes a stress-induced suppression of running's genetic effects. Our findings reveal stimulus-dependent transcriptional signatures of EE on the DG, and provide a resource for generating unbiased, data-driven hypotheses for novel mediators of EE-induced cognitive changes.

  15. Conifer R2R3-MYB transcription factors: sequence analyses and gene expression in wood-forming tissues of white spruce (Picea glauca

    Directory of Open Access Journals (Sweden)

    Grima-Pettenati Jacqueline

    2007-03-01

    Full Text Available Abstract Background Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms. Results We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences, and loblolly pine, Pinus taeda L. (five sequences. Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction. Conclusion Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco.

  16. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.

    Science.gov (United States)

    Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D

    2016-10-01

    Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Gene discovery and molecular marker development, based on high-throughput transcript sequencing of Paspalum dilatatum Poir.

    Directory of Open Access Journals (Sweden)

    Andrea Giordano

    Full Text Available BACKGROUND: Paspalum dilatatum Poir. (common name dallisgrass is a native grass species of South America, with special relevance to dairy and red meat production. P. dilatatum exhibits higher forage quality than other C4 forage grasses and is tolerant to frost and water stress. This species is predominantly cultivated in an apomictic monoculture, with an inherent high risk that biotic and abiotic stresses could potentially devastate productivity. Therefore, advanced breeding strategies that characterise and use available genetic diversity, or assess germplasm collections effectively are required to deliver advanced cultivars for production systems. However, there are limited genomic resources available for this forage grass species. RESULTS: Transcriptome sequencing using second-generation sequencing platforms has been employed using pooled RNA from different tissues (stems, roots, leaves and inflorescences at the final reproductive stage of P. dilatatum cultivar Primo. A total of 324,695 sequence reads were obtained, corresponding to c. 102 Mbp. The sequences were assembled, generating 20,169 contigs of a combined length of 9,336,138 nucleotides. The contigs were BLAST analysed against the fully sequenced grass species of Oryza sativa subsp. japonica, Brachypodium distachyon, the closely related Sorghum bicolor and foxtail millet (Setaria italica genomes as well as against the UniRef 90 protein database allowing a comprehensive gene ontology analysis to be performed. The contigs generated from the transcript sequencing were also analysed for the presence of simple sequence repeats (SSRs. A total of 2,339 SSR motifs were identified within 1,989 contigs and corresponding primer pairs were designed. Empirical validation of a cohort of 96 SSRs was performed, with 34% being polymorphic between sexual and apomictic biotypes. CONCLUSIONS: The development of genetic and genomic resources for P. dilatatum will contribute to gene discovery and expression

  18. A code for transcription initiation in mammalian genomes

    DEFF Research Database (Denmark)

    Frith, Martin C.; Valen, Eivind Dale; Krogh, Anders

    2007-01-01

    that initiation events are clustered on the chromosomes at multiple scales - clusters within clusters - indicating multiple regulatory processes. Within the smallest of such clusters, which can be interpreted as core promoters, the local DNA sequence predicts the relative transcription start usage of each...... of large- and small-scale effects: the selection of transcription start sites is largely governed by the local DNA sequence, whereas the transcriptional activity of a locus is regulated at a different level; it is affected by distal features or events such as enhancers and chromatin remodeling....

  19. Mitochondrial transcription factor A (Tfam) gene sequencing and mitochondrial evaluation in inherited retinal dysplasia in miniature schnauzer dogs.

    Science.gov (United States)

    Bauer, Bianca S; Forsyth, George W; Sandmeyer, Lynne S; Grahn, Bruce H

    2011-04-01

    Mitochondrial transcription factor A (Tfam) has been implicated in the pathogenesis of retinal dysplasia in miniature schnauzer dogs and it has been proposed that affected dogs have altered mitochondrial numbers, size, and morphology. To test these hypotheses the Tfam gene of affected and normal miniature schnauzer dogs with retinal dysplasia was sequenced and lymphocyte mitochondria were quantified, measured, and the morphology was compared in normal and affected dogs using transmission electron microscopy. For Tfam sequencing, retina, retinal pigment epithelium (RPE), and whole blood samples were collected. Total RNA was isolated from the retina and RPE and reverse transcribed to make cDNA. Genomic DNA was extracted from white blood cell pellets obtained from the whole blood samples. The Tfam coding sequence, 5' promoter region, intron1 and the 3' non-coding sequence of normal and affected dogs were amplified using polymerase chain reaction (PCR), cloned and sequenced. For electron microscopy, lymphocytes from affected and normal dogs were photographed and the mitochondria within each cross-section were identified, quantified, and the mitochondrial area (μm²) per lymphocyte cross-section was calculated. Lastly, using a masked technique, mitochondrial morphology was compared between the 2 groups. Sequencing of the miniature schnauzer Tfam gene revealed no functional sequence variation between affected and normal dogs. Lymphocyte and mitochondrial area, mitochondrial quantification, and morphology assessment also revealed no significant difference between the 2 groups. Further investigation into other candidate genes or factors causing retinal dysplasia in the miniature schnauzer is warranted.

  20. Sequencing illustrates the transcriptional response of Legionella pneumophila during infection and identifies seventy novel small non-coding RNAs.

    LENUS (Irish Health Repository)

    Weissenmayer, Barbara A

    2011-01-01

    Second generation sequencing has prompted a number of groups to re-interrogate the transcriptomes of several bacterial and archaeal species. One of the central findings has been the identification of complex networks of small non-coding RNAs that play central roles in transcriptional regulation in all growth conditions and for the pathogen\\'s interaction with and survival within host cells. Legionella pneumophila is a gram-negative facultative intracellular human pathogen with a distinct biphasic lifestyle. One of its primary environmental hosts in the free-living amoeba Acanthamoeba castellanii and its infection by L. pneumophila mimics that seen in human macrophages. Here we present analysis of strand specific sequencing of the transcriptional response of L. pneumophila during exponential and post-exponential broth growth and during the replicative and transmissive phase of infection inside A. castellanii. We extend previous microarray based studies as well as uncovering evidence of a complex regulatory architecture underpinned by numerous non-coding RNAs. Over seventy new non-coding RNAs could be identified; many of them appear to be strain specific and in configurations not previously reported. We discover a family of non-coding RNAs preferentially expressed during infection conditions and identify a second copy of 6S RNA in L. pneumophila. We show that the newly discovered putative 6S RNA as well as a number of other non-coding RNAs show evidence for antisense transcription. The nature and extent of the non-coding RNAs and their expression patterns suggests that these may well play central roles in the regulation of Legionella spp. specific traits and offer clues as to how L. pneumophila adapts to its intracellular niche. The expression profiles outlined in the study have been deposited into Genbank\\'s Gene Expression Omnibus (GEO) database under the series accession GSE27232.

  1. Post-transcription cleavage generates the 3' end of F17R transcripts in vaccinia virus

    International Nuclear Information System (INIS)

    D'Costa, Susan M.; Antczak, James B.; Pickup, David J.; Condit, Richard C.

    2004-01-01

    Most vaccinia virus intermediate and late mRNAs possess 3' ends that are extremely heterogeneous in sequence. However, late mRNAs encoding the cowpox A-type inclusion protein (ATI), the second largest subunit of the RNA polymerase, and the late telomeric transcripts possess homogeneous 3' ends. In the case of the ATI mRNA, it has been shown that the homogeneous 3' end is generated by a post-transcriptional endoribonucleolytic cleavage event. We have determined that the F17R gene also produces homogeneous transcripts generated by a post-transcriptional cleavage event. Mapping of in vivo mRNA shows that the major 3' end of the F17R transcript maps 1262 nt downstream of the F17R translational start site. In vitro transcripts spanning the in vivo 3' end are cleaved in an in vitro reaction using extracts from virus infected cells, and the site of cleavage is the same both in vivo and in vitro. Cleavage is not observed using extract from cells infected in the presence of hydroxyurea; therefore, the cleavage factor is either virus-coded or virus-induced during the post-replicative phase of virus replication. The cis-acting sequence responsible for cleavage is orientation specific and the factor responsible for cleavage activity has biochemical properties similar to the factor required for cleavage of ATI transcripts. Partially purified cleavage factor generates cleavage products of expected size when either the ATI or F17R substrates are used in vitro, strongly suggesting that cleavage of both transcripts is mediated by the same factor

  2. Structural Fingerprints of Transcription Factor Binding Site Regions

    Directory of Open Access Journals (Sweden)

    Peter Willett

    2009-03-01

    Full Text Available Fourier transforms are a powerful tool in the prediction of DNA sequence properties, such as the presence/absence of codons. We have previously compiled a database of the structural properties of all 32,896 unique DNA octamers. In this work we apply Fourier techniques to the analysis of the structural properties of human chromosomes 21 and 22 and also to three sets of transcription factor binding sites within these chromosomes. We find that, for a given structural property, the structural property power spectra of chromosomes 21 and 22 are strikingly similar. We find common peaks in their power spectra for both Sp1 and p53 transcription factor binding sites. We use the power spectra as a structural fingerprint and perform similarity searching in order to find transcription factor binding site regions. This approach provides a new strategy for searching the genome data for information. Although it is difficult to understand the relationship between specific functional properties and the set of structural parameters in our database, our structural fingerprints nevertheless provide a useful tool for searching for function information in sequence data. The power spectrum fingerprints provide a simple, fast method for comparing a set of functional sequences, in this case transcription factor binding site regions, with the sequences of whole chromosomes. On its own, the power spectrum fingerprint does not find all transcription factor binding sites in a chromosome, but the results presented here show that in combination with other approaches, this technique will improve the chances of identifying functional sequences hidden in genomic data.

  3. Uncovering transcriptional interactions via an adaptive fuzzy logic approach

    Directory of Open Access Journals (Sweden)

    Chen Chung-Ming

    2009-12-01

    Full Text Available Abstract Background To date, only a limited number of transcriptional regulatory interactions have been uncovered. In a pilot study integrating sequence data with microarray data, a position weight matrix (PWM performed poorly in inferring transcriptional interactions (TIs, which represent physical interactions between transcription factors (TF and upstream sequences of target genes. Inferring a TI means that the promoter sequence of a target is inferred to match the consensus sequence motifs of a potential TF, and their interaction type such as AT or RT is also predicted. Thus, a robust PWM (rPWM was developed to search for consensus sequence motifs. In addition to rPWM, one feature extracted from ChIP-chip data was incorporated to identify potential TIs under specific conditions. An interaction type classifier was assembled to predict activation/repression of potential TIs using microarray data. This approach, combining an adaptive (learning fuzzy inference system and an interaction type classifier to predict transcriptional regulatory networks, was named AdaFuzzy. Results AdaFuzzy was applied to predict TIs using real genomics data from Saccharomyces cerevisiae. Following one of the latest advances in predicting TIs, constrained probabilistic sparse matrix factorization (cPSMF, and using 19 transcription factors (TFs, we compared AdaFuzzy to four well-known approaches using over-representation analysis and gene set enrichment analysis. AdaFuzzy outperformed these four algorithms. Furthermore, AdaFuzzy was shown to perform comparably to 'ChIP-experimental method' in inferring TIs identified by two sets of large scale ChIP-chip data, respectively. AdaFuzzy was also able to classify all predicted TIs into one or more of the four promoter architectures. The results coincided with known promoter architectures in yeast and provided insights into transcriptional regulatory mechanisms. Conclusion AdaFuzzy successfully integrates multiple types of

  4. RNA-Sequencing Reveals Unique Transcriptional Signatures of Running and Running-Independent Environmental Enrichment in the Adult Mouse Dentate Gyrus

    Directory of Open Access Journals (Sweden)

    Catherine-Alexandra Grégoire

    2018-04-01

    Full Text Available Environmental enrichment (EE is a powerful stimulus of brain plasticity and is among the most accessible treatment options for brain disease. In rodents, EE is modeled using multi-factorial environments that include running, social interactions, and/or complex surroundings. Here, we show that running and running-independent EE differentially affect the hippocampal dentate gyrus (DG, a brain region critical for learning and memory. Outbred male CD1 mice housed individually with a voluntary running disk showed improved spatial memory in the radial arm maze compared to individually- or socially-housed mice with a locked disk. We therefore used RNA sequencing to perform an unbiased interrogation of DG gene expression in mice exposed to either a voluntary running disk (RUN, a locked disk (LD, or a locked disk plus social enrichment and tunnels [i.e., a running-independent complex environment (CE]. RNA sequencing revealed that RUN and CE mice showed distinct, non-overlapping patterns of transcriptomic changes versus the LD control. Bio-informatics uncovered that the RUN and CE environments modulate separate transcriptional networks, biological processes, cellular compartments and molecular pathways, with RUN preferentially regulating synaptic and growth-related pathways and CE altering extracellular matrix-related functions. Within the RUN group, high-distance runners also showed selective stress pathway alterations that correlated with a drastic decline in overall transcriptional changes, suggesting that excess running causes a stress-induced suppression of running’s genetic effects. Our findings reveal stimulus-dependent transcriptional signatures of EE on the DG, and provide a resource for generating unbiased, data-driven hypotheses for novel mediators of EE-induced cognitive changes.

  5. Pervasive, Genome-Wide Transcription in the Organelle Genomes of Diverse Plastid-Bearing Protists

    Directory of Open Access Journals (Sweden)

    Matheus Sanitá Lima

    2017-11-01

    Full Text Available Organelle genomes are among the most sequenced kinds of chromosome. This is largely because they are small and widely used in molecular studies, but also because next-generation sequencing technologies made sequencing easier, faster, and cheaper. However, studies of organelle RNA have not kept pace with those of DNA, despite huge amounts of freely available eukaryotic RNA-sequencing (RNA-seq data. Little is known about organelle transcription in nonmodel species, and most of the available eukaryotic RNA-seq data have not been mined for organelle transcripts. Here, we use publicly available RNA-seq experiments to investigate organelle transcription in 30 diverse plastid-bearing protists with varying organelle genomic architectures. Mapping RNA-seq data to organelle genomes revealed pervasive, genome-wide transcription, regardless of the taxonomic grouping, gene organization, or noncoding content. For every species analyzed, transcripts covered ≥85% of the mitochondrial and/or plastid genomes (all of which were ≤105 kb, indicating that most of the organelle DNA—coding and noncoding—is transcriptionally active. These results follow earlier studies of model species showing that organellar transcription is coupled and ubiquitous across the genome, requiring significant downstream processing of polycistronic transcripts. Our findings suggest that noncoding organelle DNA can be transcriptionally active, raising questions about the underlying function of these transcripts and underscoring the utility of publicly available RNA-seq data for recovering complete genome sequences. If pervasive transcription is also found in bigger organelle genomes (>105 kb and across a broader range of eukaryotes, this could indicate that noncoding organelle RNAs are regulating fundamental processes within eukaryotic cells.

  6. A Herpesviral Immediate Early Protein Promotes Transcription Elongation of Viral Transcripts

    Directory of Open Access Journals (Sweden)

    Hannah L. Fox

    2017-06-01

    Full Text Available Herpes simplex virus 1 (HSV-1 genes are transcribed by cellular RNA polymerase II (RNA Pol II. While four viral immediate early proteins (ICP4, ICP0, ICP27, and ICP22 function in some capacity in viral transcription, the mechanism by which ICP22 functions remains unclear. We observed that the FACT complex (comprised of SSRP1 and Spt16 was relocalized in infected cells as a function of ICP22. ICP22 was also required for the association of FACT and the transcription elongation factors SPT5 and SPT6 with viral genomes. We further demonstrated that the FACT complex interacts with ICP22 throughout infection. We therefore hypothesized that ICP22 recruits cellular transcription elongation factors to viral genomes for efficient transcription elongation of viral genes. We reevaluated the phenotype of an ICP22 mutant virus by determining the abundance of all viral mRNAs throughout infection by transcriptome sequencing (RNA-seq. The accumulation of almost all viral mRNAs late in infection was reduced compared to the wild type, regardless of kinetic class. Using chromatin immunoprecipitation sequencing (ChIP-seq, we mapped the location of RNA Pol II on viral genes and found that RNA Pol II levels on the bodies of viral genes were reduced in the ICP22 mutant compared to wild-type virus. In contrast, the association of RNA Pol II with transcription start sites in the mutant was not reduced. Taken together, our results indicate that ICP22 plays a role in recruiting elongation factors like the FACT complex to the HSV-1 genome to allow for efficient viral transcription elongation late in viral infection and ultimately infectious virion production.

  7. Mitotic Transcriptional Activation: Clearance of Actively Engaged Pol II via Transcriptional Elongation Control in Mitosis.

    Science.gov (United States)

    Liang, Kaiwei; Woodfin, Ashley R; Slaughter, Brian D; Unruh, Jay R; Box, Andrew C; Rickels, Ryan A; Gao, Xin; Haug, Jeffrey S; Jaspersen, Sue L; Shilatifard, Ali

    2015-11-05

    Although it is established that some general transcription factors are inactivated at mitosis, many details of mitotic transcription inhibition (MTI) and its underlying mechanisms are largely unknown. We have identified mitotic transcriptional activation (MTA) as a key regulatory step to control transcription in mitosis for genes with transcriptionally engaged RNA polymerase II (Pol II) to activate and transcribe until the end of the gene to clear Pol II from mitotic chromatin, followed by global impairment of transcription reinitiation through MTI. Global nascent RNA sequencing and RNA fluorescence in situ hybridization demonstrate the existence of transcriptionally engaged Pol II in early mitosis. Both genetic and chemical inhibition of P-TEFb in mitosis lead to delays in the progression of cell division. Together, our study reveals a mechanism for MTA and MTI whereby transcriptionally engaged Pol II can progress into productive elongation and finish transcription to allow proper cellular division. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  9. Triple helix-forming oligonucleotide corresponding to the polypyrimidine sequence in the rat alpha 1(I) collagen promoter specifically inhibits factor binding and transcription.

    Science.gov (United States)

    Kovacs, A; Kandala, J C; Weber, K T; Guntaka, R V

    1996-01-19

    Type I and III fibrillar collagens are the major structural proteins of the extracellular matrix found in various organs including the myocardium. Abnormal and progressive accumulation of fibrillar type I collagen in the interstitial spaces compromises organ function and therefore, the study of transcriptional regulation of this gene and specific targeting of its expression is of major interest. Transient transfection of adult cardiac fibroblasts indicate that the polypurine-polypyrimidine sequence of alpha 1(I) collagen promoter between nucleotides - 200 and -140 represents an overall positive regulatory element. DNase I footprinting and electrophoretic mobility shift assays suggest that multiple factors bind to different elements of this promoter region. We further demonstrate that the unique polypyrimidine sequence between -172 and -138 of the promoter represents a suitable target for a single-stranded polypurine oligonucleotide (TFO) to form a triple helix DNA structure. Modified electrophoretic mobility shift assays show that this TFO specifically inhibits the protein-DNA interaction within the target region. In vitro transcription assays and transient transfection experiments demonstrate that the transcriptional activity of the promoter is inhibited by this oligonucleotide. We propose that TFOs represent a therapeutic potential to specifically influence the expression of alpha 1(I) collagen gene in various disease states where abnormal type I collagen accumulation is known to occur.

  10. Mapping of gene transcripts by nuclease protection assays and cDNA primer extension

    International Nuclear Information System (INIS)

    Calzone, F.J.; Britten, R.J.; Davidson, E.J.

    1987-01-01

    An important problem often faced in the molecular characterization of genes is the precise mapping of those genomic sequences transcribed into RNA. This requires identification of the genomic site initiating gene transcription, the location of genomic sequences removed from the primary gene transcript during RNA processing, and knowledge of sequences terminating the processed gene transcript. The objective of the protocols described here is the generation of transcription maps utilizing relatively uncharacterized gene fragments. The basic approach is hybridization of a single-stranded DNA probe with cellular RNA, followed by treatment with a single-strand-specific nuclease that does not attack DNA-RNA hybrids, in order to destroy any unreacted probe sequences. Thus the probe sequences included in the hybrid duplexes are protected from nuclease digestion. The sizes of the protected probe fragments determined by gel electrophoresis correspond to the lengths of the hybridized sequence elements

  11. HAfTs are novel lncRNA transcripts from aflatoxin exposure.

    Directory of Open Access Journals (Sweden)

    B Alex Merrick

    Full Text Available The transcriptome can reveal insights into precancer biology. We recently conducted RNA-Seq analysis on liver RNA from male rats exposed to the carcinogen, aflatoxin B1 (AFB1, for 90 days prior to liver tumor onset. Among >1,000 differentially expressed transcripts, several novel, unannotated Cufflinks-assembled transcripts, or HAfTs (Hepatic Aflatoxin Transcripts were found. We hypothesized PCR-cloning and RACE (rapid amplification of cDNA ends could further HAfT identification. Sanger data was obtained for 6 transcripts by PCR and 16 transcripts by 5'- and 3'-RACE. BLAST alignments showed, with two exceptions, HAfT transcripts were lncRNAs, >200nt without apparent long open reading frames. Six rat HAfT transcripts were classified as 'novel' without RefSeq annotation. Sequence alignment and genomic synteny showed each rat lncRNA had a homologous locus in the mouse genome and over half had homologous loci in the human genome, including at least two loci (and possibly three others that were previously unannotated. While HAfT functions are not yet clear, coregulatory roles may be possible from their adjacent orientation to known coding genes with altered expression that include 8 HAfT-gene pairs. For example, a unique rat HAfT, homologous to Pvt1, was adjacent to known genes controlling cell proliferation. Additionally, PCR and RACE Sanger sequencing showed many alternative splice variants and refinements of exon sequences compared to Cufflinks assembled transcripts and gene prediction algorithms. Presence of multiple splice variants and short tandem repeats found in some HAfTs may be consequential for secondary structure, transcriptional regulation, and function. In summary, we report novel, differentially expressed lncRNAs after exposure to the genotoxicant, AFB1, prior to neoplastic lesions. Complete cloning and sequencing of such transcripts could pave the way for a new set of sensitive and early prediction markers for chemical

  12. A Herpesviral Immediate Early Protein Promotes Transcription Elongation of Viral Transcripts.

    Science.gov (United States)

    Fox, Hannah L; Dembowski, Jill A; DeLuca, Neal A

    2017-06-13

    Herpes simplex virus 1 (HSV-1) genes are transcribed by cellular RNA polymerase II (RNA Pol II). While four viral immediate early proteins (ICP4, ICP0, ICP27, and ICP22) function in some capacity in viral transcription, the mechanism by which ICP22 functions remains unclear. We observed that the FACT complex (comprised of SSRP1 and Spt16) was relocalized in infected cells as a function of ICP22. ICP22 was also required for the association of FACT and the transcription elongation factors SPT5 and SPT6 with viral genomes. We further demonstrated that the FACT complex interacts with ICP22 throughout infection. We therefore hypothesized that ICP22 recruits cellular transcription elongation factors to viral genomes for efficient transcription elongation of viral genes. We reevaluated the phenotype of an ICP22 mutant virus by determining the abundance of all viral mRNAs throughout infection by transcriptome sequencing (RNA-seq). The accumulation of almost all viral mRNAs late in infection was reduced compared to the wild type, regardless of kinetic class. Using chromatin immunoprecipitation sequencing (ChIP-seq), we mapped the location of RNA Pol II on viral genes and found that RNA Pol II levels on the bodies of viral genes were reduced in the ICP22 mutant compared to wild-type virus. In contrast, the association of RNA Pol II with transcription start sites in the mutant was not reduced. Taken together, our results indicate that ICP22 plays a role in recruiting elongation factors like the FACT complex to the HSV-1 genome to allow for efficient viral transcription elongation late in viral infection and ultimately infectious virion production. IMPORTANCE HSV-1 interacts with many cellular proteins throughout productive infection. Here, we demonstrate the interaction of a viral protein, ICP22, with a subset of cellular proteins known to be involved in transcription elongation. We determined that ICP22 is required to recruit the FACT complex and other transcription

  13. Targeted reduction of highly abundant transcripts using pseudo-random primers.

    Science.gov (United States)

    Arnaud, Ophélie; Kato, Sachi; Poulain, Stéphane; Plessy, Charles

    2016-04-01

    Transcriptome studies based on quantitative sequencing can estimate levels of gene expression by measuring target RNA abundance in sequencing libraries. Sequencing costs are proportional to the total number of sequenced reads, and in order to cover rare RNAs, considerable quantities of abundant and identical reads are needed. This major limitation can be addressed by depleting a proportion of the most abundant sequences from the library. However, such depletion strategies involve either extra handling of the input RNA sample or use of a large number of reverse transcription primers, termed not-so-random (NSR) primers, which are costly to synthesize. Taking advantage of the high tolerance of reverse transcriptase to mis-prime, we found that it is possible to use as few as 40 pseudo-random (PS) reverse transcription primers to decrease the rate of undesirable abundant sequences within a library without affecting the overall transcriptome diversity. PS primers are simple to design and can be used to deplete several undesirable RNAs simultaneously, thus creating a flexible tool for enriching transcriptome libraries for rare transcript sequences.

  14. Characteristics of MHC class I genes in house sparrows Passer domesticus as revealed by long cDNA transcripts and amplicon sequencing.

    Science.gov (United States)

    Karlsson, Maria; Westerdahl, Helena

    2013-08-01

    In birds the major histocompatibility complex (MHC) organization differs both among and within orders; chickens Gallus gallus of the order Galliformes have a simple arrangement, while many songbirds of the order Passeriformes have a more complex arrangement with larger numbers of MHC class I and II genes. Chicken MHC genes are found at two independent loci, classical MHC-B and non-classical MHC-Y, whereas non-classical MHC genes are yet to be verified in passerines. Here we characterize MHC class I transcripts (α1 to α3 domain) and perform amplicon sequencing using a next-generation sequencing technique on exon 3 from house sparrow Passer domesticus (a passerine) families. Then we use phylogenetic, selection, and segregation analyses to gain a better understanding of the MHC class I organization. Trees based on the α1 and α2 domain revealed a distinct cluster with short terminal branches for transcripts with a 6-bp deletion. Interestingly, this cluster was not seen in the tree based on the α3 domain. 21 exon 3 sequences were verified in a single individual and the average numbers within an individual were nine and five for sequences with and without a 6-bp deletion, respectively. All individuals had exon 3 sequences with and without a 6-bp deletion. The sequences with a 6-bp deletion have many characteristics in common with non-classical MHC, e.g., highly conserved amino acid positions were substituted compared with the other alleles, low nucleotide diversity and just a single site was subject to positive selection. However, these alleles also have characteristics that suggest they could be classical, e.g., complete linkage and absence of a distinct cluster in a tree based on the α3 domain. Thus, we cannot determine for certain whether or not the alleles with a 6-bp deletion are non-classical based on our present data. Further analyses on segregation patterns of these alleles in combination with dating the 6-bp deletion through MHC characterization across the

  15. TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data.

    Science.gov (United States)

    Jorjani, Hadi; Zavolan, Mihaela

    2014-04-01

    Accurate identification of transcription start sites (TSSs) is an essential step in the analysis of transcription regulatory networks. In higher eukaryotes, the capped analysis of gene expression technology enabled comprehensive annotation of TSSs in genomes such as those of mice and humans. In bacteria, an equivalent approach, termed differential RNA sequencing (dRNA-seq), has recently been proposed, but the application of this approach to a large number of genomes is hindered by the paucity of computational analysis methods. With few exceptions, when the method has been used, annotation of TSSs has been largely done manually. In this work, we present a computational method called 'TSSer' that enables the automatic inference of TSSs from dRNA-seq data. The method rests on a probabilistic framework for identifying both genomic positions that are preferentially enriched in the dRNA-seq data as well as preferentially captured relative to neighboring genomic regions. Evaluating our approach for TSS calling on several publicly available datasets, we find that TSSer achieves high consistency with the curated lists of annotated TSSs, but identifies many additional TSSs. Therefore, TSSer can accelerate genome-wide identification of TSSs in bacterial genomes and can aid in further characterization of bacterial transcription regulatory networks. TSSer is freely available under GPL license at http://www.clipz.unibas.ch/TSSer/index.php

  16. Zipper plot: visualizing transcriptional activity of genomic regions.

    Science.gov (United States)

    Avila Cobos, Francisco; Anckaert, Jasper; Volders, Pieter-Jan; Everaert, Celine; Rombaut, Dries; Vandesompele, Jo; De Preter, Katleen; Mestdagh, Pieter

    2017-05-02

    Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. Current state-of-the-art tools for long non-coding RNA (lncRNA) annotation are mainly based on evolutionary constraints, which may result in false negatives due to the overall limited conservation of lncRNAs. To tackle this problem we have developed the Zipper plot, a novel visualization and analysis method that enables users to simultaneously interrogate thousands of human putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These include publicly available CAGE-sequencing, ChIP-sequencing and DNase-sequencing datasets. Our method only requires three tab-separated fields (chromosome, genomic coordinate of the TSS and strand) as input and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot. Using the Zipper plot, we found evidence of transcription for a set of well-characterized lncRNAs and observed that fewer mono-exonic lncRNAs have CAGE peaks overlapping with their TSSs compared to multi-exonic lncRNAs. Using publicly available RNA-seq data, we found more than one hundred cases where junction reads connected protein-coding gene exons with a downstream mono-exonic lncRNA, revealing the need for a careful evaluation of lncRNA 5'-boundaries. Our method is implemented using the statistical programming language R and is freely available as a webtool.

  17. Pervasive, Genome-Wide Transcription in the Organelle Genomes of Diverse Plastid-Bearing Protists.

    Science.gov (United States)

    Sanitá Lima, Matheus; Smith, David Roy

    2017-11-06

    Organelle genomes are among the most sequenced kinds of chromosome. This is largely because they are small and widely used in molecular studies, but also because next-generation sequencing technologies made sequencing easier, faster, and cheaper. However, studies of organelle RNA have not kept pace with those of DNA, despite huge amounts of freely available eukaryotic RNA-sequencing (RNA-seq) data. Little is known about organelle transcription in nonmodel species, and most of the available eukaryotic RNA-seq data have not been mined for organelle transcripts. Here, we use publicly available RNA-seq experiments to investigate organelle transcription in 30 diverse plastid-bearing protists with varying organelle genomic architectures. Mapping RNA-seq data to organelle genomes revealed pervasive, genome-wide transcription, regardless of the taxonomic grouping, gene organization, or noncoding content. For every species analyzed, transcripts covered ≥85% of the mitochondrial and/or plastid genomes (all of which were ≤105 kb), indicating that most of the organelle DNA-coding and noncoding-is transcriptionally active. These results follow earlier studies of model species showing that organellar transcription is coupled and ubiquitous across the genome, requiring significant downstream processing of polycistronic transcripts. Our findings suggest that noncoding organelle DNA can be transcriptionally active, raising questions about the underlying function of these transcripts and underscoring the utility of publicly available RNA-seq data for recovering complete genome sequences. If pervasive transcription is also found in bigger organelle genomes (>105 kb) and across a broader range of eukaryotes, this could indicate that noncoding organelle RNAs are regulating fundamental processes within eukaryotic cells. Copyright © 2017 Sanitá Lima and Smith.

  18. Draft genome sequence and transcriptional analysis of Rosellinia necatrix infected with a virulent mycovirus.

    Science.gov (United States)

    Shimizu, Takeo; Kanematsu, Satoko; Yaegashi, Hajime

    2018-04-24

    Understanding the molecular mechanisms of pathogenesis is useful in developing effective control methods for fungal diseases. The white root rot fungus Rosellinia necatrix is a soil-borne pathogen that causes serious economic losses in various crops, including fruit trees, worldwide. Here, using next-generation sequencing techniques, we first produced a 44-Mb draft genome sequence of R. necatrix strain W97, an isolate from Japan, in which 12,444 protein-coding genes were predicted. To survey differentially expressed genes (DEGs) associated with the pathogenesis of the fungus, the hypovirulent W97 strain infected with Rosellinia necatrix megabirnavirus 1 (RnMBV1) was used for a comprehensive transcriptome analysis. In total, 545 and 615 genes are up- and down-regulated, respectively, in R. necatrix infected with RnMBV1. Gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses of the DEGs suggested that primary and secondary metabolism would be greatly disturbed in R. necatrix infected with RnMBV1. The genes encoding transcriptional regulators, plant cell wall-degrading enzymes, and toxin production, such as cytochalasin E, were also found in the DEGs. The genetic resources provided in this study will accelerate the discovery of genes associated with pathogenesis and other biological characteristics of R. necatrix, thus contributing to disease control.

  19. The Eimeria Transcript DB: an integrated resource for annotated transcripts of protozoan parasites of the genus Eimeria

    Science.gov (United States)

    Rangel, Luiz Thibério; Novaes, Jeniffer; Durham, Alan M.; Madeira, Alda Maria B. N.; Gruber, Arthur

    2013-01-01

    Parasites of the genus Eimeria infect a wide range of vertebrate hosts, including chickens. We have recently reported a comparative analysis of the transcriptomes of Eimeria acervulina, Eimeria maxima and Eimeria tenella, integrating ORESTES data produced by our group and publicly available Expressed Sequence Tags (ESTs). All cDNA reads have been assembled, and the reconstructed transcripts have been submitted to a comprehensive functional annotation pipeline. Additional studies included orthology assignment across apicomplexan parasites and clustering analyses of gene expression profiles among different developmental stages of the parasites. To make all this body of information publicly available, we constructed the Eimeria Transcript Database (EimeriaTDB), a web repository that provides access to sequence data, annotation and comparative analyses. Here, we describe the web interface, available sequence data sets and query tools implemented on the site. The main goal of this work is to offer a public repository of sequence and functional annotation data of reconstructed transcripts of parasites of the genus Eimeria. We believe that EimeriaTDB will represent a valuable and complementary resource for the Eimeria scientific community and for those researchers interested in comparative genomics of apicomplexan parasites. Database URL: http://www.coccidia.icb.usp.br/eimeriatdb/ PMID:23411718

  20. Application of Six Thinking Hats with the Theme „Profession of Sociologist”. Transcript of the Sequence of Green Hat

    Directory of Open Access Journals (Sweden)

    Gheorghe Onuţ

    2009-12-01

    Full Text Available The study is the transcription of the sequence of green hat from the application of the creative technique Six Thinking Hats (Edward de Bono’s creation that I did at the workshop with the theme „Profession of Sociologist”, of the international colloquium of social sciences ACUM 2008. The colloquium ACUM is the most important of the scientific manifestations organized by the Faculty of Law and Sociology of „Transilvania” University of Braşov.

  1. Mapping the transcription termination region of the mouse immunoglobulin kappa gene

    International Nuclear Information System (INIS)

    Xu, M.; Garrard, W.T.

    1986-01-01

    To define the transcription termination region of the mouse immunoglobulin kappa gene, they have subcloned single copy DNA sequences corresponding to both the template and the non-template strands of this locus. In vitro nuclear transcription with isolated MPC-11 nuclei was performed and the resulting 32 P-labeled RNA was hybridized to slot-blotted, single-stranded M13 probes covering regions within and flanking the kappa gene. The hybridization pattern for the template-strand reveals that transcription terminates within the region between 1.1 to 2.3 kb downstream from the poly(A) site. Ten different short sequences (8-13 bp) reside within 460 bp of this region that exhibit homology with sequences found in the termination regions of mouse β-globin and chicken ovalbumin genes. Transcription of the non-template strand occurs on either side of this termination region. They note that no transcription is detectable on the non-template strand downstream of the enhancer, indicating that if RNA polymerase II enters at this site, it does not initiate transcription during transit to the promoter region. They conclude that transcription of the kappa gene passes the poly(A) addition site and terminates within 2.3 Kb downstream

  2. Extensive polycistronism and antisense transcription in the mammalian Hox clusters.

    Directory of Open Access Journals (Sweden)

    Gaëll Mainguy

    Full Text Available The Hox clusters play a crucial role in body patterning during animal development. They encode both Hox transcription factor and micro-RNA genes that are activated in a precise temporal and spatial sequence that follows their chromosomal order. These remarkable collinear properties confer functional unit status for Hox clusters. We developed the TranscriptView platform to establish high resolution transcriptional profiling and report here that transcription in the Hox clusters is far more complex than previously described in both human and mouse. Unannotated transcripts can represent up to 60% of the total transcriptional output of a cluster. In particular, we identified 14 non-coding Transcriptional Units antisense to Hox genes, 10 of which (70% have a detectable mouse homolog. Most of these Transcriptional Units in both human and mouse present conserved sizeable sequences (>40 bp overlapping Hox transcripts, suggesting that these Hox antisense transcripts are functional. Hox clusters also display at least seven polycistronic clusters, i.e., different genes being co-transcribed on long isoforms (up to 30 kb. This work provides a reevaluated framework for understanding Hox gene function and dys-function. Such extensive transcriptions may provide a structural explanation for Hox clustering.

  3. Discovery of novel transcripts of the human tissue kallikrein (KLK1) and kallikrein-related peptidase 2 (KLK2) in human cancer cells, exploiting Next-Generation Sequencing technology.

    Science.gov (United States)

    Adamopoulos, Panagiotis G; Kontos, Christos K; Scorilas, Andreas

    2018-03-31

    Tissue kallikrein, kallikrein-related peptidases (KLKs), and plasma kallikrein form the largest group of serine proteases in the human genome, sharing many structural and functional properties. Several KLK transcripts have been found aberrantly expressed in numerous human malignancies, confirming their prognostic or/and diagnostic values. However, the process of alternative splicing can now be studied in-depth due to the development of Next-Generation Sequencing (NGS). In the present study, we used NGS to discover novel transcripts of the KLK1 and KLK2 genes, after nested touchdown PCR. Bioinformatics analysis and PCR experiments revealed a total of eleven novel KLK transcripts (two KLK1 and nine KLK2 transcripts). In addition, the expression profiles of each novel transcript were investigated with nested PCR experiments using variant-specific primers. Since KLKs are implicated in human malignancies, qualifying as potential biomarkers, the quantification of the presented novel transcripts in human samples may have clinical applications in different types of cancer. Copyright © 2018. Published by Elsevier Inc.

  4. DNA Binding by the Ribosomal DNA Transcription Factor Rrn3 Is Essential for Ribosomal DNA Transcription*

    Science.gov (United States)

    Stepanchick, Ann; Zhi, Huijun; Cavanaugh, Alice H.; Rothblum, Katrina; Schneider, David A.; Rothblum, Lawrence I.

    2013-01-01

    The human homologue of yeast Rrn3 is an RNA polymerase I-associated transcription factor that is essential for ribosomal DNA (rDNA) transcription. The generally accepted model is that Rrn3 functions as a bridge between RNA polymerase I and the transcription factors bound to the committed template. In this model Rrn3 would mediate an interaction between the mammalian Rrn3-polymerase I complex and SL1, the rDNA transcription factor that binds to the core promoter element of the rDNA. In the course of studying the role of Rrn3 in recruitment, we found that Rrn3 was in fact a DNA-binding protein. Analysis of the sequence of Rrn3 identified a domain with sequence similarity to the DNA binding domain of heat shock transcription factor 2. Randomization, or deletion, of the amino acids in this region in Rrn3, amino acids 382–400, abrogated its ability to bind DNA, indicating that this domain was an important contributor to DNA binding by Rrn3. Control experiments demonstrated that these mutant Rrn3 constructs were capable of interacting with both rpa43 and SL1, two other activities demonstrated to be essential for Rrn3 function. However, neither of these Rrn3 mutants was capable of functioning in transcription in vitro. Moreover, although wild-type human Rrn3 complemented a yeast rrn3-ts mutant, the DNA-binding site mutant did not. These results demonstrate that DNA binding by Rrn3 is essential for transcription by RNA polymerase I. PMID:23393135

  5. DNA binding by the ribosomal DNA transcription factor rrn3 is essential for ribosomal DNA transcription.

    Science.gov (United States)

    Stepanchick, Ann; Zhi, Huijun; Cavanaugh, Alice H; Rothblum, Katrina; Schneider, David A; Rothblum, Lawrence I

    2013-03-29

    The human homologue of yeast Rrn3 is an RNA polymerase I-associated transcription factor that is essential for ribosomal DNA (rDNA) transcription. The generally accepted model is that Rrn3 functions as a bridge between RNA polymerase I and the transcription factors bound to the committed template. In this model Rrn3 would mediate an interaction between the mammalian Rrn3-polymerase I complex and SL1, the rDNA transcription factor that binds to the core promoter element of the rDNA. In the course of studying the role of Rrn3 in recruitment, we found that Rrn3 was in fact a DNA-binding protein. Analysis of the sequence of Rrn3 identified a domain with sequence similarity to the DNA binding domain of heat shock transcription factor 2. Randomization, or deletion, of the amino acids in this region in Rrn3, amino acids 382-400, abrogated its ability to bind DNA, indicating that this domain was an important contributor to DNA binding by Rrn3. Control experiments demonstrated that these mutant Rrn3 constructs were capable of interacting with both rpa43 and SL1, two other activities demonstrated to be essential for Rrn3 function. However, neither of these Rrn3 mutants was capable of functioning in transcription in vitro. Moreover, although wild-type human Rrn3 complemented a yeast rrn3-ts mutant, the DNA-binding site mutant did not. These results demonstrate that DNA binding by Rrn3 is essential for transcription by RNA polymerase I.

  6. Identification and characterization of transcript polymorphisms in soybean lines varying in oil composition and content.

    Science.gov (United States)

    Goettel, Wolfgang; Xia, Eric; Upchurch, Robert; Wang, Ming-Li; Chen, Pengyin; An, Yong-Qiang Charles

    2014-04-23

    Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality.

  7. Widespread anti-sense transcription in apple is correlated with siRNA production and indicates a large potential for transcriptional and/or post-transcriptional control.

    Science.gov (United States)

    Celton, Jean-Marc; Gaillard, Sylvain; Bruneau, Maryline; Pelletier, Sandra; Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Navarro, Lionel; Laurens, François; Renou, Jean-Pierre

    2014-07-01

    Characterizing the transcriptome of eukaryotic organisms is essential for studying gene regulation and its impact on phenotype. The realization that anti-sense (AS) and noncoding RNA transcription is pervasive in many genomes has emphasized our limited understanding of gene transcription and post-transcriptional regulation. Numerous mechanisms including convergent transcription, anti-correlated expression of sense and AS transcripts, and RNAi remain ill-defined. Here, we have combined microarray analysis and high-throughput sequencing of small RNAs (sRNAs) to unravel the complexity of transcriptional and potential post-transcriptional regulation in eight organs of apple (Malus × domestica). The percentage of AS transcript expression is higher than that identified in annual plants such as rice and Arabidopsis thaliana. Furthermore, we show that a majority of AS transcripts are transcribed beyond 3'UTR regions, and may cover a significant portion of the predicted sense transcripts. Finally we demonstrate at a genome-wide scale that anti-sense transcript expression is correlated with the presence of both short (21-23 nt) and long (> 30 nt) siRNAs, and that the sRNA coverage depth varies with the level of AS transcript expression. Our study provides a new insight on the functional role of anti-sense transcripts at the genome-wide level, and a new basis for the understanding of sRNA biogenesis in plants. © 2014 INRA. New Phytologist © 2014 New Phytologist Trust.

  8. Repetitive Elements in Mycoplasma hyopneumoniae Transcriptional Regulation.

    Directory of Open Access Journals (Sweden)

    Amanda Malvessi Cattani

    Full Text Available Transcriptional regulation, a multiple-step process, is still poorly understood in the important pig pathogen Mycoplasma hyopneumoniae. Basic motifs like promoters and terminators have already been described, but no other cis-regulatory elements have been found. DNA repeat sequences have been shown to be an interesting potential source of cis-regulatory elements. In this work, a genome-wide search for tandem and palindromic repetitive elements was performed in the intergenic regions of all coding sequences from M. hyopneumoniae strain 7448. Computational analysis demonstrated the presence of 144 tandem repeats and 1,171 palindromic elements. The DNA repeat sequences were distributed within the 5' upstream regions of 86% of transcriptional units of M. hyopneumoniae strain 7448. Comparative analysis between distinct repetitive sequences found in related mycoplasma genomes demonstrated different percentages of conservation among pathogenic and nonpathogenic strains. qPCR assays revealed differential expression among genes showing variable numbers of repetitive elements. In addition, repeats found in 206 genes already described to be differentially regulated under different culture conditions of M. hyopneumoniae strain 232 showed almost 80% conservation in relation to M. hyopneumoniae strain 7448 repeats. Altogether, these findings suggest a potential regulatory role of tandem and palindromic DNA repeats in the M. hyopneumoniae transcriptional profile.

  9. Repetitive Elements in Mycoplasma hyopneumoniae Transcriptional Regulation.

    Science.gov (United States)

    Cattani, Amanda Malvessi; Siqueira, Franciele Maboni; Guedes, Rafael Lucas Muniz; Schrank, Irene Silveira

    2016-01-01

    Transcriptional regulation, a multiple-step process, is still poorly understood in the important pig pathogen Mycoplasma hyopneumoniae. Basic motifs like promoters and terminators have already been described, but no other cis-regulatory elements have been found. DNA repeat sequences have been shown to be an interesting potential source of cis-regulatory elements. In this work, a genome-wide search for tandem and palindromic repetitive elements was performed in the intergenic regions of all coding sequences from M. hyopneumoniae strain 7448. Computational analysis demonstrated the presence of 144 tandem repeats and 1,171 palindromic elements. The DNA repeat sequences were distributed within the 5' upstream regions of 86% of transcriptional units of M. hyopneumoniae strain 7448. Comparative analysis between distinct repetitive sequences found in related mycoplasma genomes demonstrated different percentages of conservation among pathogenic and nonpathogenic strains. qPCR assays revealed differential expression among genes showing variable numbers of repetitive elements. In addition, repeats found in 206 genes already described to be differentially regulated under different culture conditions of M. hyopneumoniae strain 232 showed almost 80% conservation in relation to M. hyopneumoniae strain 7448 repeats. Altogether, these findings suggest a potential regulatory role of tandem and palindromic DNA repeats in the M. hyopneumoniae transcriptional profile.

  10. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  11. Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale

    Directory of Open Access Journals (Sweden)

    Liu He

    2017-10-01

    Full Text Available Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS and single-molecule real-time (SMRT sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET and one sucrose transporter (SUT are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT and four cellulose synthase (Ces genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.

  12. Probabilistic Inference on Multiple Normalized Signal Profiles from Next Generation Sequencing: Transcription Factor Binding Sites

    KAUST Repository

    Wong, Ka-Chun; Peng, Chengbin; Li, Yue

    2015-01-01

    With the prevalence of chromatin immunoprecipitation (ChIP) with sequencing (ChIP-Seq) technology, massive ChIP-Seq data has been accumulated. The ChIP-Seq technology measures the genome-wide occupancy of DNA-binding proteins in vivo. It is well-known that different DNA-binding protein occupancies may result in a gene being regulated in different conditions (e.g. different cell types). To fully understand a gene's function, it is essential to develop probabilistic models on multiple ChIP-Seq profiles for deciphering the gene transcription causalities. In this work, we propose and describe two probabilistic models. Assuming the conditional independence of different DNA-binding proteins' occupancies, the first method (SignalRanker) is developed as an intuitive method for ChIP-Seq genome-wide signal profile inference. Unfortunately, such an assumption may not always hold in some gene regulation cases. Thus, we propose and describe another method (FullSignalRanker) which does not make the conditional independence assumption. The proposed methods are compared with other existing methods on ENCODE ChIP-Seq datasets, demonstrating its regression and classification ability. The results suggest that FullSignalRanker is the best-performing method for recovering the signal ranks on the promoter and enhancer regions. In addition, FullSignalRanker is also the best-performing method for peak sequence classification. We envision that SignalRanker and FullSignalRanker will become important in the era of next generation sequencing. FullSignalRanker program is available on the following website: http://www.cs.toronto.edu/∼wkc/FullSignalRanker/ © 2015 IEEE.

  13. Probabilistic Inference on Multiple Normalized Signal Profiles from Next Generation Sequencing: Transcription Factor Binding Sites

    KAUST Repository

    Wong, Ka-Chun

    2015-04-20

    With the prevalence of chromatin immunoprecipitation (ChIP) with sequencing (ChIP-Seq) technology, massive ChIP-Seq data has been accumulated. The ChIP-Seq technology measures the genome-wide occupancy of DNA-binding proteins in vivo. It is well-known that different DNA-binding protein occupancies may result in a gene being regulated in different conditions (e.g. different cell types). To fully understand a gene\\'s function, it is essential to develop probabilistic models on multiple ChIP-Seq profiles for deciphering the gene transcription causalities. In this work, we propose and describe two probabilistic models. Assuming the conditional independence of different DNA-binding proteins\\' occupancies, the first method (SignalRanker) is developed as an intuitive method for ChIP-Seq genome-wide signal profile inference. Unfortunately, such an assumption may not always hold in some gene regulation cases. Thus, we propose and describe another method (FullSignalRanker) which does not make the conditional independence assumption. The proposed methods are compared with other existing methods on ENCODE ChIP-Seq datasets, demonstrating its regression and classification ability. The results suggest that FullSignalRanker is the best-performing method for recovering the signal ranks on the promoter and enhancer regions. In addition, FullSignalRanker is also the best-performing method for peak sequence classification. We envision that SignalRanker and FullSignalRanker will become important in the era of next generation sequencing. FullSignalRanker program is available on the following website: http://www.cs.toronto.edu/∼wkc/FullSignalRanker/ © 2015 IEEE.

  14. Transcription-based model for the induction of chromosomal exchange events by ionising radiation

    International Nuclear Information System (INIS)

    Radford, I.A.

    2003-01-01

    The mechanistic basis for chromosomal aberration formation, following exposure of mammalian cells to ionising radiation, has long been debated. Although chromosomal aberrations are probably initiated by DNA double-strand breaks (DSB), little is understood about the mechanisms that generate and modulate DNA rearrangement. Based on results from our laboratory and data from the literature, a novel model of chromosomal aberration formation has been suggested (Radford 2002). The basic postulates of this model are that: (1) DSB, primarily those involving multiple individual damage sites (i.e. complex DSB), are the critical initiating lesion; (2) only those DSB occurring in transcription units that are associated with transcription 'factories' (complexes containing multiple transcription units) induce chromosomal exchange events; (3) such DSB are brought into contact with a DNA topoisomerase I molecule through RNA polymerase II catalysed transcription and give rise to trapped DNA-topo I cleavage complexes; and (4) trapped complexes interact with another topo I molecule on a temporarily inactive transcription unit at the same transcription factory leading to DNA cleavage and subsequent strand exchange between the cleavage complexes. We have developed a method using inverse PCR that allows the detection and sequencing of putative ionising radiation-induced DNA rearrangements involving different regions of the human genome (Forrester and Radford 1998). The sequences detected by inverse PCR can provide a test of the prediction of the transcription-based model that ionising radiation-induced DNA rearrangements occur between sequences in active transcription units. Accordingly, reverse transcriptase PCR was used to determine if sequences involved in rearrangements were transcribed in the test cells. Consistent with the transcription-based model, nearly all of the sequences examined gave a positive result to reverse transcriptase PCR (Forrester and Radford unpublished)

  15. A directed approach for the identification of transcripts harbouring the spliced leader sequence and the effect of trans-splicing knockdown in Schistosoma mansoni

    Directory of Open Access Journals (Sweden)

    Marina de Moraes Mourao

    2013-09-01

    Full Text Available Schistosomiasis is a major neglected tropical disease caused by trematodes from the genus Schistosoma. Because schistosomes exhibit a complex life cycle and numerous mechanisms for regulating gene expression, it is believed that spliced leader (SL trans-splicing could play an important role in the biology of these parasites. The purpose of this study was to investigate the function of trans-splicing in Schistosoma mansoni through analysis of genes that may be regulated by this mechanism and via silencing SL-containing transcripts through RNA interference. Here, we report our analysis of SL transcript-enriched cDNA libraries from different S. mansoni life stages. Our results show that the trans-splicing mechanism is apparently not associated with specific genes, subcellular localisations or life stages. In cross-species comparisons, even though the sets of genes that are subject to SL trans-splicing regulation appear to differ between organisms, several commonly shared orthologues were observed. Knockdown of trans-spliced transcripts in sporocysts resulted in a systemic reduction of the expression levels of all tested trans-spliced transcripts; however, the only phenotypic effect observed was diminished larval size. Further studies involving the findings from this work will provide new insights into the role of trans-splicing in the biology of S. mansoni and other organisms. All Expressed Sequence Tags generated in this study were submitted to dbEST as five different libraries. The accessions for each library and for the individual sequences are as follows: (i adult worms of mixed sexes (LIBEST_027999: JZ139310 - JZ139779, (ii female adult worms (LIBEST_028000: JZ139780 - JZ140379, (iii male adult worms (LIBEST_028001: JZ140380 - JZ141002, (iv eggs (LIBEST_028002: JZ141003 - JZ141497 and (v schistosomula (LIBEST_028003: JZ141498 - JZ141974.

  16. Dual Regulation of Bacillus subtilis kinB Gene Encoding a Sporulation Trigger by SinR through Transcription Repression and Positive Stringent Transcription Control.

    Science.gov (United States)

    Fujita, Yasutaro; Ogura, Mitsuo; Nii, Satomi; Hirooka, Kazutake

    2017-01-01

    It is known that transcription of kinB encoding a trigger for Bacillus subtilis sporulation is under repression by SinR, a master repressor of biofilm formation, and under positive stringent transcription control depending on the adenine species at the transcription initiation nucleotide (nt). Deletion and base substitution analyses of the kinB promoter (P kinB ) region using lacZ fusions indicated that either a 5-nt deletion (Δ5, nt -61/-57, +1 is the transcription initiation nt) or the substitution of G at nt -45 with A (G-45A) relieved kinB repression. Thus, we found a pair of SinR-binding consensus sequences (GTTCTYT; Y is T or C) in an inverted orientation (SinR-1) between nt -57/-42, which is most likely a SinR-binding site for kinB repression. This relief from SinR repression likely requires SinI, an antagonist of SinR. Surprisingly, we found that SinR is essential for positive stringent transcription control of P kinB . Electrophoretic mobility shift assay (EMSA) analysis indicated that SinR bound not only to SinR-1 but also to SinR-2 (nt -29/-8) consisting of another pair of SinR consensus sequences in a tandem repeat arrangement; the two sequences partially overlap the '-35' and '-10' regions of P kinB . Introduction of base substitutions (T-27C C-26T) in the upstream consensus sequence of SinR-2 affected positive stringent transcription control of P kinB , suggesting that SinR binding to SinR-2 likely causes this positive control. EMSA also implied that RNA polymerase and SinR are possibly bound together to SinR-2 to form a transcription initiation complex for kinB transcription. Thus, it was suggested in this work that derepression of kinB from SinR repression by SinI induced by Spo0A∼P and occurrence of SinR-dependent positive stringent transcription control of kinB might induce effective sporulation cooperatively, implying an intimate interplay by stringent response, sporulation, and biofilm formation.

  17. Targeted transcriptional repression using a chimeric TALE-SRDX repressor protein

    KAUST Repository

    Mahfouz, Magdy M.

    2011-12-14

    Transcriptional activator-like effectors (TALEs) are proteins secreted by Xanthomonas bacteria when they infect plants. TALEs contain a modular DNA binding domain that can be easily engineered to bind any sequence of interest, and have been used to provide user-selected DNA-binding modules to generate chimeric nucleases and transcriptional activators in mammalian cells and plants. Here we report the use of TALEs to generate chimeric sequence-specific transcriptional repressors. The dHax3 TALE was used as a scaffold to provide a DNA-binding module fused to the EAR-repression domain (SRDX) to generate a chimeric repressor that targets the RD29A promoter. The dHax3. SRDX protein efficiently repressed the transcription of the RD29A

  18. Targeted transcriptional repression using a chimeric TALE-SRDX repressor protein

    KAUST Repository

    Mahfouz, Magdy M.; Li, Lixin; Piatek, Marek J.; Fang, Xiaoyun; Mansour, Hicham; Bangarusamy, Dhinoth K.; Zhu, Jian-Kang

    2011-01-01

    Transcriptional activator-like effectors (TALEs) are proteins secreted by Xanthomonas bacteria when they infect plants. TALEs contain a modular DNA binding domain that can be easily engineered to bind any sequence of interest, and have been used to provide user-selected DNA-binding modules to generate chimeric nucleases and transcriptional activators in mammalian cells and plants. Here we report the use of TALEs to generate chimeric sequence-specific transcriptional repressors. The dHax3 TALE was used as a scaffold to provide a DNA-binding module fused to the EAR-repression domain (SRDX) to generate a chimeric repressor that targets the RD29A promoter. The dHax3. SRDX protein efficiently repressed the transcription of the RD29A

  19. Nuclear RNA sequencing of the mouse erythroid cell transcriptome.

    Directory of Open Access Journals (Sweden)

    Jennifer A Mitchell

    Full Text Available In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs.

  20. Functional analysis of limb transcriptional enhancers in the mouse.

    Science.gov (United States)

    Nolte, Mark J; Wang, Ying; Deng, Jian Min; Swinton, Paul G; Wei, Caimiao; Guindani, Michele; Schwartz, Robert J; Behringer, Richard R

    2014-01-01

    Transcriptional enhancers are genomic sequences bound by transcription factors that act together with basal transcriptional machinery to regulate gene transcription. Several high-throughput methods have generated large datasets of tissue-specific enhancer sequences with putative roles in developmental processes. However, few enhancers have been deleted from the genome to determine their roles in development. To understand the roles of two enhancers active in the mouse embryonic limb bud we deleted them from the genome. Although the genes regulated by these enhancers are unknown, they were selected because they were identified in a screen for putative limb bud-specific enhancers associated with p300, an acetyltransferase that participates in protein complexes that promote active transcription, and because the orthologous human enhancers (H1442 and H280) drive distinct lacZ expression patterns in limb buds of embryonic day (E) 11.5 transgenic mice. We show that the orthologous mouse sequences, M1442 and M280, regulate dynamic expression in the developing limb. Although significant transcriptional differences in enhancer-proximal genes in embryonic limb buds accompany the deletion of M1442 and M280 no gross limb malformations during embryonic development were observed, demonstrating that M1442 and M280 are not required for mouse limb development. However, M280 is required for the development and/or maintenance of body size; M280 mice are significantly smaller than controls. M280 also harbors an "ultraconserved" sequence that is identical between human, rat, and mouse. This is the first report of a phenotype resulting from the deletion of an ultraconserved element. These studies highlight the importance of determining enhancer regulatory function by experiments that manipulate them in situ and suggest that some of an enhancer's regulatory capacities may be developmentally tolerated rather than developmentally required. © 2014 Wiley Periodicals, Inc.

  1. Transcriptional and phylogenetic analysis of five complete ambystomatid salamander mitochondrial genomes.

    Science.gov (United States)

    Samuels, Amy K; Weisrock, David W; Smith, Jeramiah J; France, Katherine J; Walker, John A; Putta, Srikrishna; Voss, S Randal

    2005-04-11

    We report on a study that extended mitochondrial transcript information from a recent EST project to obtain complete mitochondrial genome sequence for 5 tiger salamander complex species (Ambystoma mexicanum, A. t. tigrinum, A. andersoni, A. californiense, and A. dumerilii). We describe, for the first time, aspects of mitochondrial transcription in a representative amphibian, and then use complete mitochondrial sequence data to examine salamander phylogeny at both deep and shallow levels of evolutionary divergence. The available mitochondrial ESTs for A. mexicanum (N=2481) and A. t. tigrinum (N=1205) provided 92% and 87% coverage of the mitochondrial genome, respectively. Complete mitochondrial sequences for all species were rapidly obtained by using long distance PCR and DNA sequencing. A number of genome structural characteristics (base pair length, base composition, gene number, gene boundaries, codon usage) were highly similar among all species and to other distantly related salamanders. Overall, mitochondrial transcription in Ambystoma approximated the pattern observed in other vertebrates. We inferred from the mapping of ESTs onto mtDNA that transcription occurs from both heavy and light strand promoters and continues around the entire length of the mtDNA, followed by post-transcriptional processing. However, the observation of many short transcripts corresponding to rRNA genes indicates that transcription may often terminate prematurely to bias transcription of rRNA genes; indeed an rRNA transcription termination signal sequence was observed immediately following the 16S rRNA gene. Phylogenetic analyses of salamander family relationships consistently grouped Ambystomatidae in a clade containing Cryptobranchidae and Hynobiidae, to the exclusion of Salamandridae. This robust result suggests a novel alternative hypothesis because previous studies have consistently identified Ambystomatidae and Salamandridae as closely related taxa. Phylogenetic analyses of tiger

  2. Transcription blockage by stable H-DNA analogs in vitro.

    Science.gov (United States)

    Pandey, Shristi; Ogloblina, Anna M; Belotserkovskii, Boris P; Dolinnaya, Nina G; Yakubovskaya, Marianna G; Mirkin, Sergei M; Hanawalt, Philip C

    2015-08-18

    DNA sequences that can form unusual secondary structures are implicated in regulating gene expression and causing genomic instability. H-palindromes are an important class of such DNA sequences that can form an intramolecular triplex structure, H-DNA. Within an H-palindrome, the H-DNA and canonical B-DNA are in a dynamic equilibrium that shifts toward H-DNA with increased negative supercoiling. The interplay between H- and B-DNA and the fact that the process of transcription affects supercoiling makes it difficult to elucidate the effects of H-DNA upon transcription. We constructed a stable structural analog of H-DNA that cannot flip into B-DNA, and studied the effects of this structure on transcription by T7 RNA polymerase in vitro. We found multiple transcription blockage sites adjacent to and within sequences engaged in this triplex structure. Triplex-mediated transcription blockage varied significantly with changes in ambient conditions: it was exacerbated in the presence of Mn(2+) or by increased concentrations of K(+) and Li(+). Analysis of the detailed pattern of the blockage suggests that RNA polymerase is sterically hindered by H-DNA and has difficulties in unwinding triplex DNA. The implications of these findings for the biological roles of triple-stranded DNA structures are discussed. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. RNA Pol II promotes transcription of centromeric satellite DNA in beetles.

    Directory of Open Access Journals (Sweden)

    Zeljka Pezer

    Full Text Available Transcripts of centromeric satellite DNAs are known to play a role in heterochromatin formation as well as in establishment of the kinetochore. However, little is known about basic mechanisms of satellite DNA expression within constitutive heterochromatin and its regulation. Here we present comprehensive analysis of transcription of abundant centromeric satellite DNA, PRAT from beetle Palorus ratzeburgii (Coleoptera. This satellite is characterized by preservation and extreme sequence conservation among evolutionarily distant insect species. PRAT is expressed in all three developmental stages: larvae, pupae and adults at similar level. Transcripts are abundant comprising 0.033% of total RNA and are heterogeneous in size ranging from 0.5 kb up to more than 5 kb. Transcription proceeds from both strands but with 10 fold different expression intensity and transcripts are not processed into siRNAs. Most of the transcripts (80% are not polyadenylated and remain in the nucleus while a small portion is exported to the cytoplasm. Multiple, irregularly distributed transcription initiation sites as well as termination sites have been mapped within the PRAT sequence using primer extension and RLM-RACE. The presence of cap structure as well as poly(A tails in a portion of the transcripts indicate RNA polymerase II-dependent transcription and a putative polymerase II promoter site overlaps the most conserved part of the PRAT sequence. The treatment of larvae with alpha-amanitin decreases the level of PRAT transcripts at concentrations that selectively inhibit pol II activity. In conclusion, stable, RNA polymerase II dependant transcripts of abundant centromeric satellite DNA, not regulated by RNAi, have been identified and characterized. This study offers a basic understanding of expression of highly abundant heterochromatic DNA which in beetle species constitutes up to 50% of the genome.

  4. Characterization of a novel radiation-inducible transcript, uscA, and analysis of its transcriptional regulation

    Energy Technology Data Exchange (ETDEWEB)

    Lim, Sang Yong; Kim, Dong Ho; Joe, Min Ho

    2010-03-15

    The transcriptional expression of the uscA promote (P{sub uscA}) only occurred under aerobic conditions and a dose of 2Gy maximally activated transcription of P{sub uscA}. However, various environmental stress including physical shocks (pH, temperature, osmotic shock), DNA damaging agents (UV and MMC) or oxidative stressagents (paraquat, menadione, and H{sub 2}O{sub 2}) didn't cause the transcriptional activationof P{sub uscA}. The transcription of uscA was initiated at 170 bp upstream of the cyoA start codon, and ended around the ampG stop codon. The size of uscA was determined through reverse transcription assay, approximately 250 bp. The deletion analysis of uscA promoter demonstrates that radiation inducibility of P{sub uscA} is mediated by sequences present between -20 and +111 relativeto +1 of P{sub uscA} and radiation causes P{sub uscA} activation thorough permitting the expression that is repressed under non-irradiated conditions

  5. Characterization of a novel radiation-inducible transcript, uscA, and analysis of its transcriptional regulation

    International Nuclear Information System (INIS)

    Lim, Sang Yong; Kim, Dong Ho; Joe, Min Ho

    2010-03-01

    The transcriptional expression of the uscA promote (P uscA ) only occurred under aerobic conditions and a dose of 2Gy maximally activated transcription of P uscA . However, various environmental stress including physical shocks (pH, temperature, osmotic shock), DNA damaging agents (UV and MMC) or oxidative stressagents (paraquat, menadione, and H 2 O 2 ) didn't cause the transcriptional activationof P uscA . The transcription of uscA was initiated at 170 bp upstream of the cyoA start codon, and ended around the ampG stop codon. The size of uscA was determined through reverse transcription assay, approximately 250 bp. The deletion analysis of uscA promoter demonstrates that radiation inducibility of P uscA is mediated by sequences present between -20 and +111 relativeto +1 of P uscA and radiation causes P uscA activation thorough permitting the expression that is repressed under non-irradiated conditions

  6. Molecular population dynamics of DNA structures in a bcl-2 promoter sequence is regulated by small molecules and the transcription factor hnRNP LL.

    Science.gov (United States)

    Cui, Yunxi; Koirala, Deepak; Kang, HyunJin; Dhakal, Soma; Yangyuoru, Philip; Hurley, Laurence H; Mao, Hanbin

    2014-05-01

    Minute difference in free energy change of unfolding among structures in an oligonucleotide sequence can lead to a complex population equilibrium, which is rather challenging for ensemble techniques to decipher. Herein, we introduce a new method, molecular population dynamics (MPD), to describe the intricate equilibrium among non-B deoxyribonucleic acid (DNA) structures. Using mechanical unfolding in laser tweezers, we identified six DNA species in a cytosine (C)-rich bcl-2 promoter sequence. Population patterns of these species with and without a small molecule (IMC-76 or IMC-48) or the transcription factor hnRNP LL are compared to reveal the MPD of different species. With a pattern recognition algorithm, we found that IMC-48 and hnRNP LL share 80% similarity in stabilizing i-motifs with 60 s incubation. In contrast, IMC-76 demonstrates an opposite behavior, preferring flexible DNA hairpins. With 120-180 s incubation, IMC-48 and hnRNP LL destabilize i-motifs, which has been previously proposed to activate bcl-2 transcriptions. These results provide strong support, from the population equilibrium perspective, that small molecules and hnRNP LL can modulate bcl-2 transcription through interaction with i-motifs. The excellent agreement with biochemical results firmly validates the MPD analyses, which, we expect, can be widely applicable to investigate complex equilibrium of biomacromolecules. © 2014 The Author(s). Published by Oxford University Press [on behalf of Nucleic Acids Research].

  7. Possible interaction between B1 retrotransposon-containing sequences and β(major) globin gene transcriptional activation during MEL cell erythroid differentiation.

    Science.gov (United States)

    Vizirianakis, Ioannis S; Tezias, Sotirios S; Amanatiadou, Elsa P; Tsiftsoglou, Asterios S

    2012-01-01

    Repetitive sequences consist of >50% of mammalian genomic DNAs and among these SINEs (short interspersed nuclear elements), e.g. B1 elements, account for 8% of the mouse genome. In an effort to delineate the molecular mechanism(s) involved in the blockade of the in vitro differentiation program of MEL (murine erythroleukaemia) cells by treatment with methylation inhibitors, we detected a DNA region of 559 bp in chromosome 7 located downstream of the 3'-end of the β(major) globin gene (designated B1-559) with unique characteristics. We have fully characterized this B1-559 region that includes a B1 element, several repeats of ATG initiation codons and consensus DNA-binding sites for erythroid-specific transcription factors NF-E2 (nuclear factor-erythroid-derived 2), GATA-1 and EKLF (erythroid Krüppel-like factor). Fragments derived from B1-559 incubated with nuclear extracts form protein complexes in both undifferentiated and differentiated MEL cells. Transient reporter-gene experiments in MEL and human erythroleukaemia K-562 cells with recombinant constructs containing B1-559 fragments linked to HS-2 (hypersensitive site-2) sequences of human β-globin gene LCR (locus control region) indicated potential cooperation upon erythropoiesis and globin gene expression. The possible interaction between the B1-559 region and β(major) globin gene transcriptional activation upon execution of erythroid MEL cell differentiation programme is discussed. © The Author(s) Journal compilation © 2012 Portland Press Limited

  8. DNA template dependent accuracy variation of nucleotide selection in transcription.

    Directory of Open Access Journals (Sweden)

    Harriet Mellenius

    Full Text Available It has been commonly assumed that the effect of erroneous transcription of DNA genes into messenger RNAs on peptide sequence errors are masked by much more frequent errors of mRNA translation to protein. We present a theoretical model of transcriptional accuracy. It uses experimentally estimated standard free energies of double-stranded DNA and RNA/DNA hybrids and predicts a DNA template dependent transcriptional accuracy variation spanning several orders of magnitude. The model also identifies high-error as well a high-accuracy transcription motifs. The source of the large accuracy span is the context dependent variation of the stacking free energy of pairs of correct and incorrect base pairs in the ever moving transcription bubble. Our model predictions have direct experimental support from recent single molecule based identifications of transcriptional errors in the C. elegans transcriptome. Our conclusions challenge the general view that amino acid substitution errors in proteins are mainly caused by translational errors. It suggests instead that transcriptional error hotspots are the dominating source of peptide sequence errors in some DNA template contexts, while mRNA translation is the major cause of protein errors in other contexts.

  9. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  10. E-cadherin is transcriptionally activated via suppression of ZEB1 transcriptional repressor by small RNA-mediated gene silencing.

    Directory of Open Access Journals (Sweden)

    Minami Mazda

    Full Text Available RNA activation has been reported to be induced by small interfering RNAs (siRNAs that act on the promoters of several genes containing E-cadherin. In this study, we present an alternative mechanism of E-cadherin activation in human PC-3 cells by siRNAs previously reported to possess perfect-complementary sequences to E-cadherin promoter. We found that activation of E-cadherin can be also induced via suppression of ZEB1, which is a transcriptional repressor of E-cadherin, by seed-dependent silencing mechanism of these siRNAs. The functional seed-complementary sites of the siRNAs were found in the coding region in addition to the 3' untranslated region of ZEB1 mRNA. Promoter analyses indicated that E-boxes, which are ZEB1-binding sites, in the upstream promoter region are indispensable for E-cadherin transcription by the siRNAs. Thus, the results caution against ignoring siRNA seed-dependent silencing effects in genome-wide transcriptional regulation. In addition, members of miR-302/372/373/520 family, which have the same seed sequences with one of the siRNAs containing perfect-complementarity to E-cadherin promoter, are also found to activate E-cadherin transcription. Thus, E-cadherin could be upregulated by the suppression of ZEB1 transcriptional repressor by miRNAs in vivo.

  11. Large-scale transcriptome data reveals transcriptional activity of fission yeast LTR retrotransposons

    DEFF Research Database (Denmark)

    Mourier, Tobias; Willerslev, Eske

    2010-01-01

    of transcriptional activity are observed from both strands of solitary LTR sequences. Transcriptome data collected during meiosis suggests that transcription of solitary LTRs is correlated with the transcription of nearby protein-coding genes. CONCLUSIONS: Presumably, the host organism negatively regulates...

  12. Understanding gene sequence variation in the context of transcription regulation in yeast.

    Directory of Open Access Journals (Sweden)

    Irit Gat-Viks

    2010-01-01

    Full Text Available DNA sequence polymorphism in a regulatory protein can have a widespread transcriptional effect. Here we present a computational approach for analyzing modules of genes with a common regulation that are affected by specific DNA polymorphisms. We identify such regulatory-linkage modules by integrating genotypic and expression data for individuals in a segregating population with complementary expression data of strains mutated in a variety of regulatory proteins. Our procedure searches simultaneously for groups of co-expressed genes, for their common underlying linkage interval, and for their shared regulatory proteins. We applied the method to a cross between laboratory and wild strains of S. cerevisiae, demonstrating its ability to correctly suggest modules and to outperform extant approaches. Our results suggest that middle sporulation genes are under the control of polymorphism in the sporulation-specific tertiary complex Sum1p/Rfm1p/Hst1p. In another example, our analysis reveals novel inter-relations between Swi3 and two mitochondrial inner membrane proteins underlying variation in a module of aerobic cellular respiration genes. Overall, our findings demonstrate that this approach provides a useful framework for the systematic mapping of quantitative trait loci and their role in gene expression variation.

  13. Genome-wide transcription analyses in rice using tiling microarrays

    DEFF Research Database (Denmark)

    Li, Lei; Wang, Xiangfeng; Stolc, Viktor

    2006-01-01

    . We report here a full-genome transcription analysis of the indica rice subspecies using high-density oligonucleotide tiling microarrays. Our results provided expression data support for the existence of 35,970 (81.9%) annotated gene models and identified 5,464 unique transcribed intergenic regions...... that share similar compositional properties with the annotated exons and have significant homology to other plant proteins. Elucidating and mapping of all transcribed regions revealed an association between global transcription and cytological chromosome features, and an overall similarity of transcriptional......Sequencing and computational annotation revealed several features, including high gene numbers, unusual composition of the predicted genes and a large number of genes lacking homology to known genes, that distinguish the rice (Oryza sativa) genome from that of other fully sequenced model species...

  14. Human α2-HS-glycoprotein: the A and B chains with a connecting sequence are encoded by a single mRNA transcript

    International Nuclear Information System (INIS)

    Lee, C.C.; Bowman, B.H.; Yang, F.

    1987-01-01

    The α 2 -HS-glycoprotein (AHSG) is a plasma protein reported to play roles in bone mineralization and in the immune response. It is composed of two subunits, the A and B chains. Recombinant plasmids containing human cDNA AHSG have been isolated by screening an adult human liver library with a mixed oligonucleotide probe. The cDNA clones containing AHSG inserts span approximately 1.5 kilobase pairs and include the entire AHSG coding sequence, demonstrating that the A and B chains are encoded by a single mRNA transcript. The cDNA sequence predicts an 18-amino-acid signal peptide, followed by the A-chain sequence of AHSG. A heretofore unseen connecting sequence of 40 amino acids was deduced between the A- and B-chain sequences. The connecting sequence demonstrates the unique amino acid doublets and collagen triplets found in the A and B chains; it is not homologous with other reported amino acid sequences. The connecting sequence may be cleaved in a posttranslational step by limited proteolysis before mature AHSG is released into the circulation or may vary in its presence because of alternative processing. The AHSG cDNA was utilized for mapping the AHSG gene to the 3q21→qter region of human chromosome 3. The availability of the AHSG cDNA clone will facilitate the analysis of its genetic control and gene expression during development and bone formation

  15. Nsite, NsiteH and NsiteM Computer Tools for Studying Tran-scription Regulatory Elements

    KAUST Repository

    Shahmuradov, Ilham

    2015-07-02

    Summary: Gene transcription is mostly conducted through interactions of various transcription factors and their binding sites on DNA (regulatory elements, REs). Today, we are still far from understanding the real regulatory content of promoter regions. Computer methods for identification of REs remain a widely used tool for studying and understanding transcriptional regulation mechanisms. The Nsite, NsiteH and NsiteM programs perform searches for statistically significant (non-random) motifs of known human, animal and plant one-box and composite REs in a single genomic sequence, in a pair of aligned homologous sequences and in a set of functionally related sequences, respectively.

  16. Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively.

    Science.gov (United States)

    Clifford, Jacob; Adami, Christoph

    2015-09-02

    Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through position weight matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.

  17. Determination of specificity influencing residues for key transcription factor families

    DEFF Research Database (Denmark)

    Patel, Ronak Y.; Garde, Christian; Stormo, Gary D.

    2015-01-01

    Transcription factors (TFs) are major modulators of transcription and subsequent cellular processes. The binding of TFs to specific regulatory elements is governed by their specificity. Considering the gap between known TFs sequence and specificity, specificity prediction frameworks are highly de...

  18. TFIIS-Dependent Non-coding Transcription Regulates Developmental Genome Rearrangements.

    Directory of Open Access Journals (Sweden)

    Kamila Maliszewska-Olejniczak

    2015-07-01

    Full Text Available Because of their nuclear dimorphism, ciliates provide a unique opportunity to study the role of non-coding RNAs (ncRNAs in the communication between germline and somatic lineages. In these unicellular eukaryotes, a new somatic nucleus develops at each sexual cycle from a copy of the zygotic (germline nucleus, while the old somatic nucleus degenerates. In the ciliate Paramecium tetraurelia, the genome is massively rearranged during this process through the reproducible elimination of repeated sequences and the precise excision of over 45,000 short, single-copy Internal Eliminated Sequences (IESs. Different types of ncRNAs resulting from genome-wide transcription were shown to be involved in the epigenetic regulation of genome rearrangements. To understand how ncRNAs are produced from the entire genome, we have focused on a homolog of the TFIIS elongation factor, which regulates RNA polymerase II transcriptional pausing. Six TFIIS-paralogs, representing four distinct families, can be found in P. tetraurelia genome. Using RNA interference, we showed that TFIIS4, which encodes a development-specific TFIIS protein, is essential for the formation of a functional somatic genome. Molecular analyses and high-throughput DNA sequencing upon TFIIS4 RNAi demonstrated that TFIIS4 is involved in all kinds of genome rearrangements, including excision of ~48% of IESs. Localization of a GFP-TFIIS4 fusion revealed that TFIIS4 appears specifically in the new somatic nucleus at an early developmental stage, before IES excision. RT-PCR experiments showed that TFIIS4 is necessary for the synthesis of IES-containing non-coding transcripts. We propose that these IES+ transcripts originate from the developing somatic nucleus and serve as pairing substrates for germline-specific short RNAs that target elimination of their homologous sequences. Our study, therefore, connects the onset of zygotic non coding transcription to the control of genome plasticity in Paramecium

  19. Directional gene expression and antisense transcripts in sexual and asexual stages of Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    López-Barragán María J

    2011-11-01

    Full Text Available Abstract Background It has been shown that nearly a quarter of the initial predicted gene models in the Plasmodium falciparum genome contain errors. Although there have been efforts to obtain complete cDNA sequences to correct the errors, the coverage of cDNA sequences on the predicted genes is still incomplete, and many gene models for those expressed in sexual or mosquito stages have not been validated. Antisense transcripts have widely been reported in P. falciparum; however, the extent and pattern of antisense transcripts in different developmental stages remain largely unknown. Results We have sequenced seven bidirectional libraries from ring, early and late trophozoite, schizont, gametocyte II, gametocyte V, and ookinete, and four strand-specific libraries from late trophozoite, schizont, gametocyte II, and gametocyte V of the 3D7 parasites. Alignment of the cDNA sequences to the 3D7 reference genome revealed stage-specific antisense transcripts and novel intron-exon splicing junctions. Sequencing of strand-specific cDNA libraries suggested that more genes are expressed in one direction in gametocyte than in schizont. Alternatively spliced genes, antisense transcripts, and stage-specific expressed genes were also characterized. Conclusions It is necessary to continue to sequence cDNA from different developmental stages, particularly those of non-erythrocytic stages. The presence of antisense transcripts in some gametocyte and ookinete genes suggests that these antisense RNA may play an important role in gene expression regulation and parasite development. Future gene expression studies should make use of directional cDNA libraries. Antisense transcripts may partly explain the observed discrepancy between levels of mRNA and protein expression.

  20. Targeted genome regulation via synthetic programmable transcriptional regulators

    KAUST Repository

    Piatek, Agnieszka Anna

    2016-04-19

    Regulation of gene transcription controls cellular functions and coordinates responses to developmental, physiological and environmental cues. Precise and efficient molecular tools are needed to characterize the functions of single and multiple genes in linear and interacting pathways in a native context. Modular DNA-binding domains from zinc fingers (ZFs) and transcriptional activator-like proteins (TALE) are amenable to bioengineering to bind DNA target sequences of interest. As a result, ZF and TALE proteins were used to develop synthetic programmable transcription factors. However, these systems are limited by the requirement to re-engineer proteins for each new target sequence. The clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR associated 9 (Cas9) genome editing tool was recently repurposed for targeted transcriptional regulation by inactivation of the nuclease activity of Cas9. Due to the facile engineering, simplicity, precision and amenability to library construction, the CRISPR/Cas9 system is poised to revolutionize the functional genomics field across diverse eukaryotic species. In this review, we discuss the development of synthetic customizable transcriptional regulators and provide insights into their current and potential applications, with special emphasis on plant systems, in characterization of gene functions, elucidation of molecular mechanisms and their biotechnological applications. © 2016 Informa UK Limited, trading as Taylor & Francis Group

  1. Transcription analysis of the Streptomyces coelicolor A3(2) rrnA operon

    DEFF Research Database (Denmark)

    van Wezel, G P; Krab, I M; Douthwaite, S

    1994-01-01

    Transcription start sites and processing sites of the Streptomyces coelicolor A3(2) rrnA operon have been investigated by a combination of in vivo and in vitro transcription analyses. The data from these approaches are consistent with the existence of four in vivo transcription sites, corresponding...... to the promoters P1-P4. The transcription start sites are located at -597, -416, -334 and -254 relative to the start of the 16S rRNA gene. Two putative processing sites were identified, one of which is similar to a sequence reported earlier in S. coelicolor and other eubacteria. The P1 promoter is likely...... common to P2, P3 and P4 is not similar to any other known consensus promoter sequence. In fast-growing mycelium, P2 appears to be the most frequently used promoter. Transcription from all of the rrnA promoters decreased during the transition from exponential to stationary phase, although transcription...

  2. Intergenic and repeat transcription in human, chimpanzee and macaque brains measured by RNA-Seq.

    Directory of Open Access Journals (Sweden)

    Augix Guohua Xu

    Full Text Available Transcription is the first step connecting genetic information with an organism's phenotype. While expression of annotated genes in the human brain has been characterized extensively, our knowledge about the scope and the conservation of transcripts located outside of the known genes' boundaries is limited. Here, we use high-throughput transcriptome sequencing (RNA-Seq to characterize the total non-ribosomal transcriptome of human, chimpanzee, and rhesus macaque brain. In all species, only 20-28% of non-ribosomal transcripts correspond to annotated exons and 20-23% to introns. By contrast, transcripts originating within intronic and intergenic repetitive sequences constitute 40-48% of the total brain transcriptome. Notably, some repeat families show elevated transcription. In non-repetitive intergenic regions, we identify and characterize 1,093 distinct regions highly expressed in the human brain. These regions are conserved at the RNA expression level across primates studied and at the DNA sequence level across mammals. A large proportion of these transcripts (20% represents 3'UTR extensions of known genes and may play roles in alternative microRNA-directed regulation. Finally, we show that while transcriptome divergence between species increases with evolutionary time, intergenic transcripts show more expression differences among species and exons show less. Our results show that many yet uncharacterized evolutionary conserved transcripts exist in the human brain. Some of these transcripts may play roles in transcriptional regulation and contribute to evolution of human-specific phenotypic traits.

  3. In vitro fluorescence studies of transcription factor IIB-DNA interaction.

    Science.gov (United States)

    Górecki, Andrzej; Figiel, Małgorzata; Dziedzicka-Wasylewska, Marta

    2015-01-01

    General transcription factor TFIIB is one of the basal constituents of the preinitiation complex of eukaryotic RNA polymerase II, acting as a bridge between the preinitiation complex and the polymerase, and binding promoter DNA in an asymmetric manner, thereby defining the direction of the transcription. Methods of fluorescence spectroscopy together with circular dichroism spectroscopy were used to observe conformational changes in the structure of recombinant human TFIIB after binding to specific DNA sequence. To facilitate the exploration of the structural changes, several site-directed mutations have been introduced altering the fluorescence properties of the protein. Our observations showed that binding of specific DNA sequences changed the protein structure and dynamics, and TFIIB may exist in two conformational states, which can be described by a different microenvironment of W52. Fluorescence studies using both intrinsic and exogenous fluorophores showed that these changes significantly depended on the recognition sequence and concerned various regions of the protein, including those interacting with other transcription factors and RNA polymerase II. DNA binding can cause rearrangements in regions of proteins interacting with the polymerase in a manner dependent on the recognized sequences, and therefore, influence the gene expression.

  4. De novo RNA sequencing transcriptome of Rhododendron obtusum identified the early heat response genes involved in the transcriptional regulation of photosynthesis.

    Directory of Open Access Journals (Sweden)

    Linchuan Fang

    Full Text Available Rhododendron spp. is an important ornamental species that is widely cultivated for landscape worldwide. Heat stress is a major obstacle for its cultivation in south China. Previous studies on rhododendron principally focused on its physiological and biochemical processes, which are involved in a series of stress tolerance. However, molecular or genetic properties of rhododendron's response to heat stress are still poorly understood. The phenotype and chlorophyll fluorescence kinetics parameters of four rhododendron cultivars were compared under normal or heat stress conditions, and a cultivar with highest heat tolerance, "Yanzhimi" (R. obtusum was selected for transcriptome sequencing. A total of 325,429,240 high quality reads were obtained and assembled into 395,561 transcripts and 92,463 unigenes. Functional annotation showed that 38,724 unigenes had sequence similarity to known genes in at least one of the proteins or nucleotide databases used in this study. These 38,724 unigenes were categorized into 51 functional groups based on Gene Ontology classification and were blasted to 24 known cluster of orthologous groups. A total of 973 identified unigenes belonged to 57 transcription factor families, including the stress-related HSF, DREB, ZNF, and NAC genes. Photosynthesis was significantly enriched in the Kyoto Encyclopedia of Genes and Genomes pathway, and the changed expression pattern was illustrated. The key pathways and signaling components that contribute to heat tolerance in rhododendron were revealed. These results provide a potentially valuable resource that can be used for heat-tolerance breeding.

  5. De novo RNA sequencing transcriptome of Rhododendron obtusum identified the early heat response genes involved in the transcriptional regulation of photosynthesis

    Science.gov (United States)

    Tong, Jun; Dong, Yanfang; Xu, Dongyun; Mao, Jing; Zhou, Yuan

    2017-01-01

    Rhododendron spp. is an important ornamental species that is widely cultivated for landscape worldwide. Heat stress is a major obstacle for its cultivation in south China. Previous studies on rhododendron principally focused on its physiological and biochemical processes, which are involved in a series of stress tolerance. However, molecular or genetic properties of rhododendron’s response to heat stress are still poorly understood. The phenotype and chlorophyll fluorescence kinetics parameters of four rhododendron cultivars were compared under normal or heat stress conditions, and a cultivar with highest heat tolerance, “Yanzhimi” (R. obtusum) was selected for transcriptome sequencing. A total of 325,429,240 high quality reads were obtained and assembled into 395,561 transcripts and 92,463 unigenes. Functional annotation showed that 38,724 unigenes had sequence similarity to known genes in at least one of the proteins or nucleotide databases used in this study. These 38,724 unigenes were categorized into 51 functional groups based on Gene Ontology classification and were blasted to 24 known cluster of orthologous groups. A total of 973 identified unigenes belonged to 57 transcription factor families, including the stress-related HSF, DREB, ZNF, and NAC genes. Photosynthesis was significantly enriched in the Kyoto Encyclopedia of Genes and Genomes pathway, and the changed expression pattern was illustrated. The key pathways and signaling components that contribute to heat tolerance in rhododendron were revealed. These results provide a potentially valuable resource that can be used for heat-tolerance breeding. PMID:29059200

  6. Comprehensive Interrogation of Natural TALE DNA Binding Modules and Transcriptional Repressor Domains

    Science.gov (United States)

    Cong, Le; Zhou, Ruhong; Kuo, Yu-chi; Cunniff, Margaret; Zhang, Feng

    2012-01-01

    Transcription activator-like effectors (TALE) are sequence-specific DNA binding proteins that harbor modular, repetitive DNA binding domains. TALEs have enabled the creation of customizable designer transcriptional factors and sequence-specific nucleases for genome engineering. Here we report two improvements of the TALE toolbox for achieving efficient activation and repression of endogenous gene expression in mammalian cells. We show that the naturally occurring repeat variable diresidue (RVD) Asn-His (NH) has high biological activity and specificity for guanine, a highly prevalent base in mammalian genomes. We also report an effective TALE transcriptional repressor architecture for targeted inhibition of transcription in mammalian cells. These findings will improve the precision and effectiveness of genome engineering that can be achieved using TALEs. PMID:22828628

  7. Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.

    Science.gov (United States)

    Alkhateeb, Abedalrhman; Rueda, Luis

    2017-08-01

    Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.

  8. Strong spurious transcription likely contributes to DNA insert bias in typical metagenomic clone libraries.

    Science.gov (United States)

    Lam, Kathy N; Charles, Trevor C

    2015-01-01

    Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role. To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ(70) promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ(70) consensus sequences in the genome than with simple GC content. The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ(70) consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite

  9. Transcriptional analysis of the HeT-A retrotransposon in mutant and wild type stocks reveals high sequence variability at Drosophila telomeres and other unusual features

    Directory of Open Access Journals (Sweden)

    Piñeyro David

    2011-11-01

    Full Text Available Abstract Background Telomere replication in Drosophila depends on the transposition of a domesticated retroelement, the HeT-A retrotransposon. The sequence of the HeT-A retrotransposon changes rapidly resulting in differentiated subfamilies. This pattern of sequence change contrasts with the essential function with which the HeT-A is entrusted and brings about questions concerning the extent of sequence variability, the telomere contribution of different subfamilies, and whether wild type and mutant Drosophila stocks show different HeT-A scenarios. Results A detailed study on the variability of HeT-A reveals that both the level of variability and the number of subfamilies are higher than previously reported. Comparisons between GIII, a strain with longer telomeres, and its parental strain Oregon-R indicate that both strains have the same set of HeT-A subfamilies. Finally, the presence of a highly conserved splicing pattern only in its antisense transcripts indicates a putative regulatory, functional or structural role for the HeT-A RNA. Interestingly, our results also suggest that most HeT-A copies are actively expressed regardless of which telomere and where in the telomere they are located. Conclusions Our study demonstrates how the HeT-A sequence changes much faster than previously reported resulting in at least nine different subfamilies most of which could actively contribute to telomere extension in Drosophila. Interestingly, the only significant difference observed between Oregon-R and GIII resides in the nature and proportion of the antisense transcripts, suggesting a possible mechanism that would in part explain the longer telomeres of the GIII stock.

  10. RNA sequencing analysis of transcriptional change in the freshwater mussel Elliptio complanata after environmentally relevant sodium chloride exposure.

    Science.gov (United States)

    Robertson, Laura S; Galbraith, Heather S; Iwanowicz, Deborah; Blakeslee, Carrie J; Cornman, R Scott

    2017-09-01

    To identify potential biomarkers of salt stress in a freshwater sentinel species, we examined transcriptional responses of the common mussel Elliptio complanata to controlled sodium chloride (NaCl) exposures. Ribonucleic acid sequencing (RNA-Seq) of mantle tissue identified 481 transcripts differentially expressed in adult mussels exposed to 2 ppt NaCl (1.2 ppt chloride) for 7 d, of which 290 had nonoverlapping intervals. Differentially expressed gene categories included ion and transmembrane transport, oxidoreductase activity, maintenance of protein folding, and amino acid metabolism. The rate-limiting enzyme for synthesis of taurine, an amino acid frequently linked to osmotic stress in aquatic species, was upregulated, as was the transmembrane ion pump sodium/potassium adenosine 5'-triphosphatase. These patterns confirm a primary transcriptional response to the experimental dose, albeit likely overlapping with nonspecific secondary stress responses. Substantial involvement of the heat shock protein 70 chaperone family and the water-transporting aquaporin family was not detected, however, in contrast to some studies in other bivalves. A subset of the most significantly regulated genes was confirmed by quantitative polymerase chain reaction in an independent sample. Cluster analysis showed separation of mussels exposed to 2 ppt NaCl from control mussels in multivariate space, but mussels exposed to 1 ppt NaCl were largely indistinguishable from controls. Transcriptome-scale analysis of salt exposure under laboratory conditions efficiently identified candidate biomarkers for further functional analysis and field validation. Environ Toxicol Chem 2017;36:2352-2366. © Published 2017 Wiley Periodicals Inc. on behalf of SETAC. This article is a US government work and, as such, is in the public domain in the United States of America. © 2017 SETAC.

  11. The presence of five nifH-like sequences in Clostridium pasteurianum: sequence divergence and transcription properties.

    OpenAIRE

    Wang, S Z; Chen, J S; Johnson, J L

    1988-01-01

    The nifH gene encodes the iron protein (component II) of the nitrogenase complex. We have previously shown the presence in Clostridium pasteurianum of two nifH-like sequences in addition to the nifH1 gene which codes for a protein identical to the isolated iron protein. In the present study, we report that there are at least five nifH-like sequences in C. pasteurianum. DNA sequencing data indicate that the six nifH (nifH1) and nifH-like (nifH2, nifH3, nifH4, nifH5 and nifH6) sequences are not...

  12. Distributed biotin-streptavidin transcription roadblocks for mapping cotranscriptional RNA folding.

    Science.gov (United States)

    Strobel, Eric J; Watters, Kyle E; Nedialkov, Yuri; Artsimovitch, Irina; Lucks, Julius B

    2017-07-07

    RNA folding during transcription directs an order of folding that can determine RNA structure and function. However, the experimental study of cotranscriptional RNA folding has been limited by the lack of easily approachable methods that can interrogate nascent RNA structure at nucleotide resolution. To address this, we previously developed cotranscriptional selective 2΄-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) to simultaneously probe all intermediate RNA transcripts during transcription by stalling elongation complexes at catalytically dead EcoRIE111Q roadblocks. While effective, the distribution of elongation complexes using EcoRIE111Q requires laborious PCR using many different oligonucleotides for each sequence analyzed. Here, we improve the broad applicability of cotranscriptional SHAPE-Seq by developing a sequence-independent biotin-streptavidin (SAv) roadblocking strategy that simplifies the preparation of roadblocking DNA templates. We first determine the properties of biotin-SAv roadblocks. We then show that randomly distributed biotin-SAv roadblocks can be used in cotranscriptional SHAPE-Seq experiments to identify the same RNA structural transitions related to a riboswitch decision-making process that we previously identified using EcoRIE111Q. Lastly, we find that EcoRIE111Q maps nascent RNA structure to specific transcript lengths more precisely than biotin-SAv and propose guidelines to leverage the complementary strengths of each transcription roadblock in cotranscriptional SHAPE-Seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    Science.gov (United States)

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  14. Large-scale analysis of antisense transcription in wheat using the Affymetrix GeneChip Wheat Genome Array

    Directory of Open Access Journals (Sweden)

    Settles Matthew L

    2009-05-01

    Full Text Available Abstract Background Natural antisense transcripts (NATs are transcripts of the opposite DNA strand to the sense-strand either at the same locus (cis-encoded or a different locus (trans-encoded. They can affect gene expression at multiple stages including transcription, RNA processing and transport, and translation. NATs give rise to sense-antisense transcript pairs and the number of these identified has escalated greatly with the availability of DNA sequencing resources and public databases. Traditionally, NATs were identified by the alignment of full-length cDNAs or expressed sequence tags to genome sequences, but an alternative method for large-scale detection of sense-antisense transcript pairs involves the use of microarrays. In this study we developed a novel protocol to assay sense- and antisense-strand transcription on the 55 K Affymetrix GeneChip Wheat Genome Array, which is a 3' in vitro transcription (3'IVT expression array. We selected five different tissue types for assay to enable maximum discovery, and used the 'Chinese Spring' wheat genotype because most of the wheat GeneChip probe sequences were based on its genomic sequence. This study is the first report of using a 3'IVT expression array to discover the expression of natural sense-antisense transcript pairs, and may be considered as proof-of-concept. Results By using alternative target preparation schemes, both the sense- and antisense-strand derived transcripts were labeled and hybridized to the Wheat GeneChip. Quality assurance verified that successful hybridization did occur in the antisense-strand assay. A stringent threshold for positive hybridization was applied, which resulted in the identification of 110 sense-antisense transcript pairs, as well as 80 potentially antisense-specific transcripts. Strand-specific RT-PCR validated the microarray observations, and showed that antisense transcription is likely to be tissue specific. For the annotated sense

  15. Identification and Classification of New Transcripts in Dorper and Small-Tailed Han Sheep Skeletal Muscle Transcriptomes.

    Directory of Open Access Journals (Sweden)

    Tianle Chao

    Full Text Available High-throughput mRNA sequencing enables the discovery of new transcripts and additional parts of incompletely annotated transcripts. Compared with the human and cow genomes, the reference annotation level of the sheep genome is still low. An investigation of new transcripts in sheep skeletal muscle will improve our understanding of muscle development. Therefore, applying high-throughput sequencing, two cDNA libraries from the biceps brachii of small-tailed Han sheep and Dorper sheep were constructed, and whole-transcriptome analysis was performed to determine the unknown transcript catalogue of this tissue. In this study, 40,129 transcripts were finally mapped to the sheep genome. Among them, 3,467 transcripts were determined to be unannotated in the current reference sheep genome and were defined as new transcripts. Based on protein-coding capacity prediction and comparative analysis of sequence similarity, 246 transcripts were classified as portions of unannotated genes or incompletely annotated genes. Another 1,520 transcripts were predicted with high confidence to be long non-coding RNAs. Our analysis also revealed 334 new transcripts that displayed specific expression in ruminants and uncovered a number of new transcripts without intergenus homology but with specific expression in sheep skeletal muscle. The results confirmed a complex transcript pattern of coding and non-coding RNA in sheep skeletal muscle. This study provided important information concerning the sheep genome and transcriptome annotation, which could provide a basis for further study.

  16. E6-associated transcription patterns in human papilloma virus 16-positive cervical tissues.

    Science.gov (United States)

    Lin, Kezhi; Lu, Xulian; Chen, Jun; Zou, Ruanmin; Zhang, Lifang; Xue, Xiangyang

    2015-01-01

    The change in transcription pattern induced by post-transcriptional RNA splicing is an important mechanism in the regulation of the early gene expression of human papilloma virus (HPV). The present study was conducted to establish a method to specifically amplify HPV-16 E6-associated transcripts. The E6-related transcripts from 63 HPV-16-positive cervical tumor tissue samples were amplified, consisting of eight cases of low-risk intraepithelial lesions, 38 cases of high-risk intraepithelial lesions and 17 cases of cervical cancer (CxCa). The appropriate amplified segments were recovered following agarose gel electrophoresis, and subjected to further sequencing and sequence alignment analysis. Six groups of E6 transcription patterns were identified from HPV-16-positive cervical tumor tissue, including five newly-discovered transcripts. Different HPV-16 E6-associated transcription patterns were detected during the development of CxCa. Over the course of the progression of the low-grade squamous intraepithelial lesions to CxCa, the specific HPV-16 E6-associated transcription patterns and the dominant transcripts were all different. As indicated by this study, the transcription pattern of the E6 early gene of HPV-16 was closely associated with the stages of cervical carcinogenesis, and may also be involved in the development of CxCa.

  17. Alterations in transcription factor binding in radioresistant human melanoma cells after ionizing radiation

    International Nuclear Information System (INIS)

    Sahijdak, W.M.; Yang, Chin-Rang; Zuckerman, J.S.; Meyers, M.; Boothman, D.A.

    1994-01-01

    We analyzed alterations in transcription factor binding to specific, known promoter DNA consensus sequences between irradiated and unirradiated radioresistant human melanoma (U1-Mel) cells. The goal of this study was to begin to investigate which transcription factors and DNA-binding sites are responsible for the induction of specific transcripts and proteins after ionizing radiation. Transcription factor binding was observed using DNA band-shift assays and oligonucleotide competition analyses. Confluence-arrested U1-Mel cells were irradiated (4.5 Gy) and harvested at 4 h. Double-stranded oligonucleotides containing known DNA-binding consensus sites for specific transcription factors were used. Increased DNA binding activity after ionizing radiation was noted with oligonucleotides containing the CREB, NF-kB and Sp1 consensus sites. No changes in protein binding to AP-1, AP-2, AP-3, or CTF/NF1, GRE or Oct-1 consensus sequences were noted. X-ray activation of select transcription factors, which bind certain consensus sites in promoters, may cause specific induction or repression of gene transcription. 22 refs., 2 figs

  18. Statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species.

    Directory of Open Access Journals (Sweden)

    Pietro Liò

    Full Text Available A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms.

  19. Novel expressed sequences identified in a model of androgen independent prostate cancer

    Directory of Open Access Journals (Sweden)

    Jones Steven JM

    2007-01-01

    Full Text Available Abstract Background Prostate cancer is the most frequently diagnosed cancer in American men, and few effective treatment options are available to patients who develop hormone-refractory prostate cancer. The molecular changes that occur to allow prostate cells to proliferate in the absence of androgens are not fully understood. Results Subtractive hybridization experiments performed with samples from an in vivo model of hormonal progression identified 25 expressed sequences representing novel human transcripts. Intriguingly, these 25 sequences have small open-reading frames and are not highly conserved through evolution, suggesting many of these novel expressed sequences may be derived from untranslated regions of novel transcripts or from non-coding transcripts. Examination of a large metalibrary of human Serial Analysis of Gene Expression (SAGE tags demonstrated that only three of these novel sequences had been previously detected. RT-PCR experiments confirmed that the 6 sequences tested were expressed in specific human tissues, as well as in clinical samples of prostate cancer. Further RT-PCR experiments for five of these fragments indicated they originated from large untranslated regions of unannotated transcripts. Conclusion This study underlines the value of using complementary techniques in the annotation of the human genome. The tissue-specific expression of 4 of the 6 clones tested indicates the expression of these novel transcripts is tightly regulated, and future work will determine the possible role(s these novel transcripts may play in the progression of prostate cancer.

  20. Co-transcriptional formation of DNA:RNA hybrid G-quadruplex and potential function as constitutional cis element for transcription control.

    Science.gov (United States)

    Zheng, Ke-wei; Xiao, Shan; Liu, Jia-quan; Zhang, Jia-yu; Hao, Yu-hua; Tan, Zheng

    2013-05-01

    G-quadruplex formation in genomic DNA is considered to regulate transcription. Previous investigations almost exclusively focused on intramolecular G-quadruplexes formed by DNA carrying four or more G-tracts, and structure formation has rarely been studied in physiologically relevant processes. Here, we report an almost entirely neglected, but actually much more prevalent form of G-quadruplexes, DNA:RNA hybrid G-quadruplexes (HQ) that forms in transcription. HQ formation requires as few as two G-tracts instead of four on a non-template DNA strand. Potential HQ sequences (PHQS) are present in >97% of human genes, with an average of 73 PHQSs per gene. HQ modulates transcription under both in vitro and in vivo conditions. Transcriptomal analysis of human tissues implies that maximal gene expression may be limited by the number of PHQS in genes. These features suggest that HQs may play fundamental roles in transcription regulation and other transcription-mediated processes.

  1. A tobacco cDNA reveals two different transcription patterns in vegetative and reproductive organs

    Directory of Open Access Journals (Sweden)

    I. da Silva

    2002-08-01

    Full Text Available In order to identify genes expressed in the pistil that may have a role in the reproduction process, we have established an expressed sequence tags project to randomly sequence clones from a Nicotiana tabacum stigma/style cDNA library. A cDNA clone (MTL-8 showing high sequence similarity to genes encoding glycine-rich RNA-binding proteins was chosen for further characterization. Based on the extensive identity of MTL-8 to the RGP-1a sequence of N. sylvestris, a primer was defined to extend the 5' sequence of MTL-8 by RT-PCR from stigma/style RNAs. The amplification product was sequenced and it was confirmed that MTL-8 corresponds to an mRNA encoding a glycine-rich RNA-binding protein. Two transcripts of different sizes and expression patterns were identified when the MTL-8 cDNA insert was used as a probe in RNA blots. The largest is 1,100 nucleotides (nt long and markedly predominant in ovaries. The smaller transcript, with 600 nt, is ubiquitous to the vegetative and reproductive organs analyzed (roots, stems, leaves, sepals, petals, stamens, stigmas/styles and ovaries. Plants submitted to stress (wounding, virus infection and ethylene treatment presented an increased level of the 600-nt transcript in leaves, especially after tobacco necrosis virus infection. In contrast, the level of the 1,100-nt transcript seems to be unaffected by the stress conditions tested. Results of Southern blot experiments have suggested that MTL-8 is present in one or two copies in the tobacco genome. Our results suggest that the shorter transcript is related to stress while the larger one is a flower predominant and nonstress-inducible messenger.

  2. NFATC3-PLA2G15 Fusion Transcript Identified by RNA Sequencing Promotes Tumor Invasion and Proliferation in Colorectal Cancer Cell Lines.

    Science.gov (United States)

    Jang, Jee-Eun; Kim, Hwang-Phill; Han, Sae-Won; Jang, Hoon; Lee, Si-Hyun; Song, Sang-Hyun; Bang, Duhee; Kim, Tae-You

    2018-06-14

    This study was designed to identify novel fusion transcripts (FTs) and their functional significance in colorectal cancer lines. We performed paired-end RNA sequencing of 28 colorectal cancer (CRC) cell lines. FT candidates were identified using TopHat-fusion, ChimeraScan, and FusionMap tools and further experimental validation was conducted through reverse transcription-polymerase chain reaction and Sanger sequencing. FT was depleted in human CRC line and the effects on cell proliferation, cell migration, and cell invasion were analyzed. 1,380 FT candidates were detected through bioinformatics filtering. We selected 6 candidate FTs, including 4 inter-chromosomal and 2 intra-chromosomal FTs and each FT was found in at least 1 of the 28 cell lines. Moreover, when we tested 19 pairs of CRC tumor and adjacent normal tissue samples, NFATC3-PLA2G15 FT was found in 2. Knockdown of NFATC3-PLA2G15 using siRNA reduced mRNA expression of epithelial-mesenchymal transition (EMT) markers such as vimentin, twist, and fibronectin and increased mesenchymal-epithelial transition markers of E-cadherin, claudin-1, and FOXC2 in colo-320 cell line harboring NFATC3-PLA2G15 FT. The NFATC3-PLA2G15 knockdown also inhibited invasion, colony formation capacity, and cell proliferation. These results suggest that that NFATC3-PLA2G15 FTs may contribute to tumor progression by enhancing invasion by EMT and proliferation.

  3. Modelling reveals kinetic advantages of co-transcriptional splicing.

    Directory of Open Access Journals (Sweden)

    Stuart Aitken

    2011-10-01

    Full Text Available Messenger RNA splicing is an essential and complex process for the removal of intron sequences. Whereas the composition of the splicing machinery is mostly known, the kinetics of splicing, the catalytic activity of splicing factors and the interdependency of transcription, splicing and mRNA 3' end formation are less well understood. We propose a stochastic model of splicing kinetics that explains data obtained from high-resolution kinetic analyses of transcription, splicing and 3' end formation during induction of an intron-containing reporter gene in budding yeast. Modelling reveals co-transcriptional splicing to be the most probable and most efficient splicing pathway for the reporter transcripts, due in part to a positive feedback mechanism for co-transcriptional second step splicing. Model comparison is used to assess the alternative representations of reactions. Modelling also indicates the functional coupling of transcription and splicing, because both the rate of initiation of transcription and the probability that step one of splicing occurs co-transcriptionally are reduced, when the second step of splicing is abolished in a mutant reporter.

  4. Modelling reveals kinetic advantages of co-transcriptional splicing.

    Science.gov (United States)

    Aitken, Stuart; Alexander, Ross D; Beggs, Jean D

    2011-10-01

    Messenger RNA splicing is an essential and complex process for the removal of intron sequences. Whereas the composition of the splicing machinery is mostly known, the kinetics of splicing, the catalytic activity of splicing factors and the interdependency of transcription, splicing and mRNA 3' end formation are less well understood. We propose a stochastic model of splicing kinetics that explains data obtained from high-resolution kinetic analyses of transcription, splicing and 3' end formation during induction of an intron-containing reporter gene in budding yeast. Modelling reveals co-transcriptional splicing to be the most probable and most efficient splicing pathway for the reporter transcripts, due in part to a positive feedback mechanism for co-transcriptional second step splicing. Model comparison is used to assess the alternative representations of reactions. Modelling also indicates the functional coupling of transcription and splicing, because both the rate of initiation of transcription and the probability that step one of splicing occurs co-transcriptionally are reduced, when the second step of splicing is abolished in a mutant reporter.

  5. Effects of Replication and Transcription on DNA Structure-Related Genetic Instability.

    Science.gov (United States)

    Wang, Guliang; Vasquez, Karen M

    2017-01-05

    Many repetitive sequences in the human genome can adopt conformations that differ from the canonical B-DNA double helix (i.e., non-B DNA), and can impact important biological processes such as DNA replication, transcription, recombination, telomere maintenance, viral integration, transposome activation, DNA damage and repair. Thus, non-B DNA-forming sequences have been implicated in genetic instability and disease development. In this article, we discuss the interactions of non-B DNA with the replication and/or transcription machinery, particularly in disease states (e.g., tumors) that can lead to an abnormal cellular environment, and how such interactions may alter DNA replication and transcription, leading to potential conflicts at non-B DNA regions, and eventually result in genetic stability and human disease.

  6. A transcript finishing initiative for closing gaps in the human transcriptome

    DEFF Research Database (Denmark)

    Sogayar, Mari Cleide; Camargo, Anamaria A; Bettoni, Fabiana

    2004-01-01

    We report the results of a transcript finishing initiative, undertaken for the purpose of identifying and characterizing novel human transcripts, in which RT-PCR was used to bridge gaps between paired EST clusters, mapped against the genomic sequence. Each pair of EST clusters selected...

  7. Multiple RNAs from the mouse carboxypeptidase M locus: functional RNAs or transcription noise?

    Directory of Open Access Journals (Sweden)

    Castilho Beatriz A

    2009-02-01

    Full Text Available Abstract Background A major effort of the scientific community has been to obtain complete pictures of the genomes of many organisms. This has been accomplished mainly by annotation of structural and functional elements in the genome sequence, a process that has been centred in the gene concept and, as a consequence, biased toward protein coding sequences. Recently, the explosion of transcriptome data generated and the discovery of many functional non-protein coding RNAs have painted a more detailed and complex scenario for the genome. Here we analyzed the mouse carboxypeptidase M locus in this broader perspective in order to define the mouse CPM gene structure and evaluate the existence of other transcripts from the same genomic region. Results Bioinformatic analysis of nucleotide sequences that map to the mouse CPM locus suggests that, in addition to the mouse CPM mRNA, it expresses at least 33 different transcripts, many of which seem to be non-coding RNAs. We randomly chose to evaluate experimentally four of these extra transcripts. They are expressed in a tissue specific manner, indicating that they are not artefacts or transcriptional noise. Furthermore, one of these four extra transcripts shows expression patterns that differed considerably from the other ones and from the mouse CPM gene, suggesting that there may be more than one transcriptional unit in this locus. In addition, we have confirmed the mouse CPM gene RefSeq sequence by rapid amplification of cDNA ends (RACE and directional cloning. Conclusion This study supports the recent view that the majority of the genome is transcribed and that many of the resulting transcripts seem to be non-coding RNAs from introns of genes or from independent transcriptional units. Although some of the information on the transcriptome of many organisms may actually be artefacts or transcriptional noise, we argue that it can be experimentally evaluated and used to find and define biological

  8. Characterization and Improvement of RNA-Seq Precision in Quantitative Transcript Expression Profiling

    Energy Technology Data Exchange (ETDEWEB)

    Labaj, Pawel P.; Leparc, German G.; Linggi, Bryan E.; Markillie, Lye Meng; Wiley, H. S.; Kreil, David P.

    2011-07-01

    Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large scale RNA-Seq data sets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target coverage and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive target coverage of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, less than 30% of all transcripts could be quantified reliably with a relative error < 20%. Based on established tools, we then introduce a new approach for mapping and analyzing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision.

  9. Identification of a novel herpes simplex virus type 1 transcript and protein (AL3) expressed during latency.

    Science.gov (United States)

    Jaber, Tareq; Henderson, Gail; Li, Sumin; Perng, Guey-Chuen; Carpenter, Dale; Wechsler, Steven L; Jones, Clinton

    2009-10-01

    The herpes simplex virus type 1 (HSV-1) latency-associated transcript (LAT) is abundantly expressed in latently infected sensory neurons. In small animal models of infection, expression of the first 1.5 kb of LAT coding sequences is necessary and sufficient for wild-type reactivation from latency. The ability of LAT to inhibit apoptosis is important for reactivation from latency. Within the first 1.5 kb of LAT coding sequences and LAT promoter sequences, additional transcripts have been identified. For example, the anti-sense to LAT transcript (AL) is expressed in the opposite direction to LAT from the 5' end of LAT and LAT promoter sequences. In addition, the upstream of LAT (UOL) transcript is expressed in the LAT direction from sequences in the LAT promoter. Further examination of the first 1.5 kb of LAT coding sequences revealed two small ORFs that are anti-sense with respect to LAT (AL2 and AL3). A transcript spanning AL3 was detected in productively infected cells, mouse neuroblastoma cells stably expressing LAT and trigeminal ganglia (TG) of latently infected mice. Peptide-specific IgG directed against AL3 specifically recognized a protein migrating near 15 kDa in cells stably transfected with LAT, mouse neuroblastoma cells transfected with a plasmid containing the AL3 ORF and TG of latently infected mice. The inability to detect the AL3 protein during productive infection may have been because the 5' terminus of the AL3 transcript was downstream of the first in-frame methionine of the AL3 ORF during productive infection.

  10. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    2009-11-01

    Full Text Available Full-length cDNA (FLcDNA sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5' and 3' UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs, only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org.

  11. Mitochondrial transcripts and associated heteroplasmies of Ancistrus spp. (Siluriformes: Loricariidae

    Directory of Open Access Journals (Sweden)

    Daniel A. Moreira

    2015-12-01

    Full Text Available This data-set complements our paper entitled “The use of transcriptomic next-generation sequencing data to assembly mitochondrial genomes of Ancistrus spp. (Loricariidae” [6]. Here, we present the nucleotide sequences of each transcript used for mitogenomes assembly, as well as tables presenting the location of each transcript in the mitogenomes; the frequency, location and codon position of the detected heteroplasmic sites; and the start/stop codons usage, UTR, CDS and poliA-tail length for each protein coding gene. Readers are referred to the paper cited above for data interpretation and discussion.

  12. Direct Transcriptional Consequences of Somatic Mutation in Breast Cancer

    Directory of Open Access Journals (Sweden)

    Adam Shlien

    2016-08-01

    Full Text Available Disordered transcriptomes of cancer encompass direct effects of somatic mutation on transcription, coordinated secondary pathway alterations, and increased transcriptional noise. To catalog the rules governing how somatic mutation exerts direct transcriptional effects, we developed an exhaustive pipeline for analyzing RNA sequencing data, which we integrated with whole genomes from 23 breast cancers. Using X-inactivation analyses, we found that cancer cells are more transcriptionally active than intermixed stromal cells. This is especially true in estrogen receptor (ER-negative tumors. Overall, 59% of substitutions were expressed. Nonsense mutations showed lower expression levels than expected, with patterns characteristic of nonsense-mediated decay. 14% of 4,234 rearrangements caused transcriptional abnormalities, including exon skips, exon reusage, fusions, and premature polyadenylation. We found productive, stable transcription from sense-to-antisense gene fusions and gene-to-intergenic rearrangements, suggesting that these mutation classes drive more transcriptional disruption than previously suspected. Systematic integration of transcriptome with genome data reveals the rules by which transcriptional machinery interprets somatic mutation.

  13. Hypoxia-induced oxidative base modifications in the VEGF hypoxia-response element are associated with transcriptionally active nucleosomes.

    Science.gov (United States)

    Ruchko, Mykhaylo V; Gorodnya, Olena M; Pastukh, Viktor M; Swiger, Brad M; Middleton, Natavia S; Wilson, Glenn L; Gillespie, Mark N

    2009-02-01

    Reactive oxygen species (ROS) generated in hypoxic pulmonary artery endothelial cells cause transient oxidative base modifications in the hypoxia-response element (HRE) of the VEGF gene that bear a conspicuous relationship to induction of VEGF mRNA expression (K.A. Ziel et al., FASEB J. 19, 387-394, 2005). If such base modifications are indeed linked to transcriptional regulation, then they should be detected in HRE sequences associated with transcriptionally active nucleosomes. Southern blot analysis of the VEGF HRE associated with nucleosome fractions prepared by micrococcal nuclease digestion indicated that hypoxia redistributed some HRE sequences from multinucleosomes to transcriptionally active mono- and dinucleosome fractions. A simple PCR method revealed that VEGF HRE sequences harboring oxidative base modifications were found exclusively in mononucleosomes. Inhibition of hypoxia-induced ROS generation with myxathiozol prevented formation of oxidative base modifications but not the redistribution of HRE sequences into mono- and dinucleosome fractions. The histone deacetylase inhibitor trichostatin A caused retention of HRE sequences in compacted nucleosome fractions and prevented formation of oxidative base modifications. These findings suggest that the hypoxia-induced oxidant stress directed at the VEGF HRE requires the sequence to be repositioned into mononucleosomes and support the prospect that oxidative modifications in this sequence are an important step in transcriptional activation.

  14. Transcriptional Activation Domains of the Candida albicans Gcn4p and Gal4p Homologs▿ †

    OpenAIRE

    Martchenko, Mikhail; Levitin, Anastasia; Whiteway, Malcolm

    2006-01-01

    Many putative transcription factors in the pathogenic fungus Candida albicans contain sequence similarity to well-defined transcriptional regulators in the budding yeast Saccharomyces cerevisiae, but this sequence similarity is often limited to the DNA binding domains of the molecules. The Gcn4p and Gal4p proteins of Saccharomyces cerevisiae are highly studied and well-understood eukaryotic transcription factors of the basic leucine zipper (Gcn4p) and C6 zinc cluster (Gal4p) families; C. albi...

  15. DETECTION OF BACTERIAL SMALL TRANSCRIPTS FROM RNA-SEQ DATA: A COMPARATIVE ASSESSMENT.

    Science.gov (United States)

    Peña-Castillo, Lourdes; Grüell, Marc; Mulligan, Martin E; Lang, Andrew S

    2016-01-01

    Small non-coding RNAs (sRNAs) are regulatory RNA molecules that have been identified in a multitude of bacterial species and shown to control numerous cellular processes through various regulatory mechanisms. In the last decade, next generation RNA sequencing (RNA-seq) has been used for the genome-wide detection of bacterial sRNAs. Here we describe sRNA-Detect, a novel approach to identify expressed small transcripts from prokaryotic RNA-seq data. Using RNA-seq data from three bacterial species and two sequencing platforms, we performed a comparative assessment of five computational approaches for the detection of small transcripts. We demonstrate that sRNA-Detect improves upon current standalone computational approaches for identifying novel small transcripts in bacteria.

  16. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    Science.gov (United States)

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-09

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

  17. A unified architecture of transcriptional regulatory elements

    DEFF Research Database (Denmark)

    Andersson, Robin; Sandelin, Albin Gustav; Danko, Charles G.

    2015-01-01

    Gene expression is precisely controlled in time and space through the integration of signals that act at gene promoters and gene-distal enhancers. Classically, promoters and enhancers are considered separate classes of regulatory elements, often distinguished by histone modifications. However...... and enhancers are considered a single class of functional element, with a unified architecture for transcription initiation. The context of interacting regulatory elements and the surrounding sequences determine local transcriptional output as well as the enhancer and promoter activities of individual elements....

  18. Directional RNA deep sequencing sheds new light on the transcriptional response of Anabaena sp. strain PCC 7120 to combined-nitrogen deprivation

    Directory of Open Access Journals (Sweden)

    Head Steven R

    2011-06-01

    Full Text Available Abstract Background Cyanobacteria are potential sources of renewable chemicals and biofuels and serve as model organisms for bacterial photosynthesis, nitrogen fixation, and responses to environmental changes. Anabaena (Nostoc sp. strain PCC 7120 (hereafter Anabaena is a multicellular filamentous cyanobacterium that can "fix" atmospheric nitrogen into ammonia when grown in the absence of a source of combined nitrogen. Because the nitrogenase enzyme is oxygen sensitive, Anabaena forms specialized cells called heterocysts that create a microoxic environment for nitrogen fixation. We have employed directional RNA-seq to map the Anabaena transcriptome during vegetative cell growth and in response to combined-nitrogen deprivation, which induces filaments to undergo heterocyst development. Our data provide an unprecedented view of transcriptional changes in Anabaena filaments during the induction of heterocyst development and transition to diazotrophic growth. Results Using the Illumina short read platform and a directional RNA-seq protocol, we obtained deep sequencing data for RNA extracted from filaments at 0, 6, 12, and 21 hours after the removal of combined nitrogen. The RNA-seq data provided information on transcript abundance and boundaries for the entire transcriptome. From these data, we detected novel antisense transcripts within the UTRs (untranslated regions and coding regions of key genes involved in heterocyst development, suggesting that antisense RNAs may be important regulators of the nitrogen response. In addition, many 5' UTRs were longer than anticipated, sometimes extending into upstream open reading frames (ORFs, and operons often showed complex structure and regulation. Finally, many genes that had not been previously identified as being involved in heterocyst development showed regulation, providing new candidates for future studies in this model organism. Conclusions Directional RNA-seq data were obtained that provide

  19. RNA-Seq analysis and gene discovery of Andrias davidianus using Illumina short read sequencing.

    Directory of Open Access Journals (Sweden)

    Fenggang Li

    Full Text Available The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp. Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR. The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians.

  20. Second generation sequencing of the mesothelioma tumor genome.

    Directory of Open Access Journals (Sweden)

    Raphael Bueno

    2010-05-01

    Full Text Available The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large-scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type.

  1. Disruption of a Transcriptional Repressor by an Insertion Sequence Element Integration Leads to Activation of a Novel Silent Cellobiose Transporter in Lactococcus lactis MG1363.

    Science.gov (United States)

    Solopova, Ana; Kok, Jan; Kuipers, Oscar P

    2017-12-01

    Lactococcus lactis subsp. cremoris strains typically carry many dairy niche-specific adaptations. During adaptation to the milk environment these former plant strains have acquired various pseudogenes and insertion sequence elements indicative of ongoing genome decay and frequent transposition events in their genomes. Here we describe the reactivation of a silenced plant sugar utilization cluster in an L. lactis MG1363 derivative lacking the two main cellobiose transporters, PtcBA-CelB and PtcBAC, upon applying selection pressure to utilize cellobiose. A disruption of the transcriptional repressor gene llmg_1239 by an insertion sequence (IS) element allows expression of the otherwise silent novel cellobiose transporter Llmg_1244 and leads to growth of mutant strains on cellobiose. Llmg_1239 was labeled CclR, for c ellobiose cl uster r epressor. IMPORTANCE Insertion sequences (ISs) play an important role in the evolution of lactococci and other bacteria. They facilitate DNA rearrangements and are responsible for creation of new genetic variants with selective advantages under certain environmental conditions. L. lactis MG1363 possesses 71 copies in a total of 11 different types of IS elements. This study describes yet another example of an IS-mediated adaptive evolution. An integration of IS 981 or IS 905 into a gene coding for a transcriptional repressor led to activation of the repressed gene cluster coding for a plant sugar utilization pathway. The expression of the gene cluster allowed assembly of a novel cellobiose-specific transporter and led to cell growth on cellobiose. Copyright © 2017 American Society for Microbiology.

  2. Novel transcriptional networks regulated by CLOCK in human neurons.

    Science.gov (United States)

    Fontenot, Miles R; Berto, Stefano; Liu, Yuxiang; Werthmann, Gordon; Douglas, Connor; Usui, Noriyoshi; Gleason, Kelly; Tamminga, Carol A; Takahashi, Joseph S; Konopka, Genevieve

    2017-11-01

    The molecular mechanisms underlying human brain evolution are not fully understood; however, previous work suggested that expression of the transcription factor CLOCK in the human cortex might be relevant to human cognition and disease. In this study, we investigated this novel transcriptional role for CLOCK in human neurons by performing chromatin immunoprecipitation sequencing for endogenous CLOCK in adult neocortices and RNA sequencing following CLOCK knockdown in differentiated human neurons in vitro. These data suggested that CLOCK regulates the expression of genes involved in neuronal migration, and a functional assay showed that CLOCK knockdown increased neuronal migratory distance. Furthermore, dysregulation of CLOCK disrupts coexpressed networks of genes implicated in neuropsychiatric disorders, and the expression of these networks is driven by hub genes with human-specific patterns of expression. These data support a role for CLOCK-regulated transcriptional cascades involved in human brain evolution and function. © 2017 Fontenot et al.; Published by Cold Spring Harbor Laboratory Press.

  3. Transcription of highly repetitive tandemly organized DNA in amphibians and birds: A historical overview and modern concepts.

    Science.gov (United States)

    Trofimova, Irina; Krasikova, Alla

    2016-12-01

    Tandemly organized highly repetitive DNA sequences are crucial structural and functional elements of eukaryotic genomes. Despite extensive evidence, satellite DNA remains an enigmatic part of the eukaryotic genome, with biological role and significance of tandem repeat transcripts remaining rather obscure. Data on tandem repeats transcription in amphibian and avian model organisms is fragmentary despite their genomes being thoroughly characterized. Review systematically covers historical and modern data on transcription of amphibian and avian satellite DNA in somatic cells and during meiosis when chromosomes acquire special lampbrush form. We highlight how transcription of tandemly repetitive DNA sequences is organized in interphase nucleus and on lampbrush chromosomes. We offer LTR-activation hypotheses of widespread satellite DNA transcription initiation during oogenesis. Recent explanations are provided for the significance of high-yield production of non-coding RNA derived from tandemly organized highly repetitive DNA. In many cases the data on the transcription of satellite DNA can be extrapolated from lampbrush chromosomes to interphase chromosomes. Lampbrush chromosomes with applied novel technical approaches such as superresolution imaging, chromosome microdissection followed by high-throughput sequencing, dynamic observation in life-like conditions provide amazing opportunities for investigation mechanisms of the satellite DNA transcription.

  4. Nascent-Seq reveals novel features of mouse circadian transcriptional regulation

    Science.gov (United States)

    Menet, Jerome S; Rodriguez, Joseph; Abruzzi, Katharine C; Rosbash, Michael

    2012-01-01

    A substantial fraction of the metazoan transcriptome undergoes circadian oscillations in many cells and tissues. Based on the transcription feedback loops important for circadian timekeeping, it is commonly assumed that this mRNA cycling reflects widespread transcriptional regulation. To address this issue, we directly measured the circadian dynamics of mouse liver transcription using Nascent-Seq (genome-wide sequencing of nascent RNA). Although many genes are rhythmically transcribed, many rhythmic mRNAs manifest poor transcriptional rhythms, indicating a prominent contribution of post-transcriptional regulation to circadian mRNA expression. This analysis of rhythmic transcription also showed that the rhythmic DNA binding profile of the transcription factors CLOCK and BMAL1 does not determine the transcriptional phase of most target genes. This likely reflects gene-specific collaborations of CLK:BMAL1 with other transcription factors. These insights from Nascent-Seq indicate that it should have broad applicability to many other gene expression regulatory issues. DOI: http://dx.doi.org/10.7554/eLife.00011.001 PMID:23150795

  5. Sensitive detection of viral transcripts in human tumor transcriptomes.

    Directory of Open Access Journals (Sweden)

    Sven-Eric Schelhorn

    Full Text Available In excess of 12% of human cancer incidents have a viral cofactor. Epidemiological studies of idiopathic human cancers indicate that additional tumor viruses remain to be discovered. Recent advances in sequencing technology have enabled systematic screenings of human tumor transcriptomes for viral transcripts. However, technical problems such as low abundances of viral transcripts in large volumes of sequencing data, viral sequence divergence, and homology between viral and human factors significantly confound identification of tumor viruses. We have developed a novel computational approach for detecting viral transcripts in human cancers that takes the aforementioned confounding factors into account and is applicable to a wide variety of viruses and tumors. We apply the approach to conducting the first systematic search for viruses in neuroblastoma, the most common cancer in infancy. The diverse clinical progression of this disease as well as related epidemiological and virological findings are highly suggestive of a pathogenic cofactor. However, a viral etiology of neuroblastoma is currently contested. We mapped 14 transcriptomes of neuroblastoma as well as positive and negative controls to the human and all known viral genomes in order to detect both known and unknown viruses. Analysis of controls, comparisons with related methods, and statistical estimates demonstrate the high sensitivity of our approach. Detailed investigation of putative viral transcripts within neuroblastoma samples did not provide evidence for the existence of any known human viruses. Likewise, de-novo assembly and analysis of chimeric transcripts did not result in expression signatures associated with novel human pathogens. While confounding factors such as sample dilution or viral clearance in progressed tumors may mask viral cofactors in the data, in principle, this is rendered less likely by the high sensitivity of our approach and the number of biological replicates

  6. Land use type significantly affects microbial gene transcription in soil.

    Science.gov (United States)

    Nacke, Heiko; Fischer, Christiane; Thürmer, Andrea; Meinicke, Peter; Daniel, Rolf

    2014-05-01

    Soil microorganisms play an essential role in sustaining biogeochemical processes and cycling of nutrients across different land use types. To gain insights into microbial gene transcription in forest and grassland soil, we isolated mRNA from 32 sampling sites. After sequencing of generated complementary DNA (cDNA), a total of 5,824,229 sequences could be further analyzed. We were able to assign nonribosomal cDNA sequences to all three domains of life. A dominance of bacterial sequences, which were affiliated to 25 different phyla, was found. Bacterial groups capable of aromatic compound degradation such as Phenylobacterium and Burkholderia were detected in significantly higher relative abundance in forest soil than in grassland soil. Accordingly, KEGG pathway categories related to degradation of aromatic ring-containing molecules (e.g., benzoate degradation) were identified in high abundance within forest soil-derived metatranscriptomic datasets. The impact of land use type forest on community composition and activity is evidently to a high degree caused by the presence of wood breakdown products. Correspondingly, bacterial groups known to be involved in lignin degradation and containing ligninolytic genes such as Burkholderia, Bradyrhizobium, and Azospirillum exhibited increased transcriptional activity in forest soil. Higher solar radiation in grassland presumably induced increased transcription of photosynthesis-related genes within this land use type. This is in accordance with high abundance of photosynthetic organisms and plant-infecting viruses in grassland.

  7. Single-Cell RNA Sequencing of Glioblastoma Cells.

    Science.gov (United States)

    Sen, Rajeev; Dolgalev, Igor; Bayin, N Sumru; Heguy, Adriana; Tsirigos, Aris; Placantonakis, Dimitris G

    2018-01-01

    Single-cell RNA sequencing (sc-RNASeq) is a recently developed technique used to evaluate the transcriptome of individual cells. As opposed to conventional RNASeq in which entire populations are sequenced in bulk, sc-RNASeq can be beneficial when trying to better understand gene expression patterns in markedly heterogeneous populations of cells or when trying to identify transcriptional signatures of rare cells that may be underrepresented when using conventional bulk RNASeq. In this method, we describe the generation and analysis of cDNA libraries from single patient-derived glioblastoma cells using the C1 Fluidigm system. The protocol details the use of the C1 integrated fluidics circuit (IFC) for capturing, imaging and lysing cells; performing reverse transcription; and generating cDNA libraries that are ready for sequencing and analysis.

  8. Mutations on the DNA binding surface of TBP discriminate between yeast TATA and TATA-less gene transcription.

    Science.gov (United States)

    Kamenova, Ivanka; Warfield, Linda; Hahn, Steven

    2014-08-01

    Most RNA polymerase (Pol) II promoters lack a TATA element, yet nearly all Pol II transcription requires TATA binding protein (TBP). While the TBP-TATA interaction is critical for transcription at TATA-containing promoters, it has been unclear whether TBP sequence-specific DNA contacts are required for transcription at TATA-less genes. Transcription factor IID (TFIID), the TBP-containing coactivator that functions at most TATA-less genes, recognizes short sequence-specific promoter elements in metazoans, but analogous promoter elements have not been identified in Saccharomyces cerevisiae. We generated a set of mutations in the yeast TBP DNA binding surface and found that most support growth of yeast. Both in vivo and in vitro, many of these mutations are specifically defective for transcription of two TATA-containing genes with only minor defects in transcription of two TATA-less, TFIID-dependent genes. TBP binds several TATA-less promoters with apparent high affinity, but our results suggest that this binding is not important for transcription activity. Our results are consistent with the model that sequence-specific TBP-DNA contacts are not important at yeast TATA-less genes and suggest that other general transcription factors or coactivator subunits are responsible for recognition of TATA-less promoters. Our results also explain why yeast TBP derivatives defective for TATA binding appear defective in activated transcription. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  9. Development of DNA affinity techniques for the functional characterization of purified RNA polymerase II transcription factors

    International Nuclear Information System (INIS)

    Garfinkel, S.; Thompson, J.A.; Cohen, R.B.; Brendler, T.; Safer, B.

    1987-01-01

    Affinity adsorption, precipitation, and partitioning techniques have been developed to purify and characterize RNA Pol II transcription components from whole cell extracts (WCE) (HeLa) and nuclear extracts (K562). The titration of these extracts with multicopy constructs of the Ad2 MLP but not pUC8, inhibits transcriptional activity. DNA-binding factors precipitated by this technique are greatly enriched by centrifugation. Using this approach, factors binding to the upstream promoter sequence (UPS) of the Ad2 MLP have been rapidly isolated by Mono Q, Mono S, and DNA affinity chromatography. By U.V. crosslinking to nucleotides containing specific 32 P-phosphodiester bonds within the recognition sequence, this factor is identified as a M/sub r/ = 45,000 polypeptide. To generate an assay system for the functional evaluation of single transcription components, a similar approach using synthetic oligonucleotide sequences spanning single promoter binding sites has been developed. The addition of a synthetic 63-mer containing the UPS element of the Ad2 MLP to HeLa WCE inhibited transcription by 60%. The addition of partially purified UPS binding protein, but not RNA Pol II, restored transcriptional activity. The addition of synthetic oligonucleotides containing other regulatory sequences not present in the Ad2 MLP was without effect

  10. Isolation, sequence identification and tissue expression profile of a ...

    African Journals Online (AJOL)

    The complete expressed sequence tag (CDS) sequence of Banna mini-pig inbred line (BMI) ribokinase gene (RBKS) was amplified using the reverse transcription-polymerase chain reaction (RT-PCR) based on the conserved sequence information of the cattle or other mammals and known highly homologous swine ESTs.

  11. New insights into the promoterless transcription of DNA coligo templates by RNA polymerase III.

    Science.gov (United States)

    Lama, Lodoe; Seidl, Christine I; Ryan, Kevin

    2014-01-01

    Chemically synthesized DNA can carry small RNA sequence information but converting that information into small RNA is generally thought to require large double-stranded promoters in the context of plasmids, viruses and genes. We previously found evidence that circularized oligodeoxynucleotides (coligos) containing certain sequences and secondary structures can template the synthesis of small RNA by RNA polymerase III in vitro and in human cells. By using immunoprecipitated RNA polymerase III we now report corroborating evidence that this enzyme is the sole polymerase responsible for coligo transcription. The immobilized polymerase enabled experiments showing that coligo transcripts can be formed through transcription termination without subsequent 3' end trimming. To better define the determinants of productive transcription, a structure-activity relationship study was performed using over 20 new coligos. The results show that unpaired nucleotides in the coligo stem facilitate circumtranscription, but also that internal loops and bulges should be kept small to avoid secondary transcription initiation sites. A polymerase termination sequence embedded in the double-stranded region of a hairpin-encoding coligo stem can antagonize transcription. Using lessons learned from new and old coligos, we demonstrate how to convert poorly transcribed coligos into productive templates. Our findings support the possibility that coligos may prove useful as chemically synthesized vectors for the ectopic expression of small RNA in human cells.

  12. A single, specific thymine mutation in the ComK-Binding site severely decreases binding and transcription activation by the competence transcription factor ComK of Bacillus subtilis

    NARCIS (Netherlands)

    Susanna, Kim A.; Mironczuk, Aleksandra M.; Smits, Wiep Klaas; Hamoen, Leendert W.; Kuipers, Oscar P.

    The competence transcription factor ComK plays a central role in competence development in Bacillus subtilis by activating the transcription of the K regulon. ComK-activated genes are characterized by the presence of a specific sequence to which ComK binds, a K-box, in their upstream DNA region.

  13. Genome-Wide Spectra of Transcription Insertions and Deletions Reveal That Slippage Depends on RNA:DNA Hybrid Complementarity.

    Science.gov (United States)

    Traverse, Charles C; Ochman, Howard

    2017-08-29

    Advances in sequencing technologies have enabled direct quantification of genome-wide errors that occur during RNA transcription. These errors occur at rates that are orders of magnitude higher than rates during DNA replication, but due to technical difficulties such measurements have been limited to single-base substitutions and have not yet quantified the scope of transcription insertions and deletions. Previous reporter gene assay findings suggested that transcription indels are produced exclusively by elongation complex slippage at homopolymeric runs, so we enumerated indels across the protein-coding transcriptomes of Escherichia coli and Buchnera aphidicola , which differ widely in their genomic base compositions and incidence of repeat regions. As anticipated from prior assays, transcription insertions prevailed in homopolymeric runs of A and T; however, transcription deletions arose in much more complex sequences and were rarely associated with homopolymeric runs. By reconstructing the relocated positions of the elongation complex as inferred from the sequences inserted or deleted during transcription, we show that continuation of transcription after slippage hinges on the degree of nucleotide complementarity within the RNA:DNA hybrid at the new DNA template location. IMPORTANCE The high level of mistakes generated during transcription can result in the accumulation of malfunctioning and misfolded proteins which can alter global gene regulation and in the expenditure of energy to degrade these nonfunctional proteins. The transcriptome-wide occurrence of base substitutions has been elucidated in bacteria, but information on transcription insertions and deletions-errors that potentially have more dire effects on protein function-is limited to reporter gene constructs. Here, we capture the transcriptome-wide spectrum of insertions and deletions in Escherichia coli and Buchnera aphidicola and show that they occur at rates approaching those of base substitutions

  14. Transcriptional sequencing and analysis of major genes involved in the adventitious root formation of mango cotyledon segments.

    Science.gov (United States)

    Li, Yun-He; Zhang, Hong-Na; Wu, Qing-Song; Muday, Gloria K

    2017-06-01

    A total of 74,745 unigenes were generated and 1975 DEGs were identified. Candidate genes that may be involved in the adventitious root formation of mango cotyledon segment were revealed. Adventitious root formation is a crucial step in plant vegetative propagation, but the molecular mechanism of adventitious root formation remains unclear. Adventitious roots formed only at the proximal cut surface (PCS) of mango cotyledon segments, whereas no roots were formed on the opposite, distal cut surface (DCS). To identify the transcript abundance changes linked to adventitious root development, RNA was isolated from PCS and DCS at 0, 4 and 7 days after culture, respectively. Illumina sequencing of libraries generated from these samples yielded 62.36 Gb high-quality reads that were assembled into 74,745 unigenes with an average sequence length of 807 base pairs, and 33,252 of the assembled unigenes at least had homologs in one of the public databases. Comparative analysis of these transcriptome databases revealed that between the different time points at PCS there were 1966 differentially expressed genes (DEGs), while there were only 51 DEGs for the PCS vs. DCS when time-matched samples were compared. Of these DEGs, 1636 were assigned to gene ontology (GO) classes, the majority of that was involved in cellular processes, metabolic processes and single-organism processes. Candidate genes that may be involved in the adventitious root formation of mango cotyledon segment are predicted to encode polar auxin transport carriers, auxin-regulated proteins, cell wall remodeling enzymes and ethylene-related proteins. In order to validate RNA-sequencing results, we further analyzed the expression profiles of 20 genes by quantitative real-time PCR. This study expands the transcriptome information for Mangifera indica and identifies candidate genes involved in adventitious root formation in cotyledon segments of mango.

  15. Increased frequency of single base substitutions in a population of transcripts expressed in cancer cells

    Directory of Open Access Journals (Sweden)

    Bianchetti Laurent

    2012-11-01

    Full Text Available Abstract Background Single Base Substitutions (SBS that alter transcripts expressed in cancer originate from somatic mutations. However, recent studies report SBS in transcripts that are not supported by the genomic DNA of tumor cells. Methods We used sequence based whole genome expression profiling, namely Long-SAGE (L-SAGE and Tag-seq (a combination of L-SAGE and deep sequencing, and computational methods to identify transcripts with greater SBS frequencies in cancer. Millions of tags produced by 40 healthy and 47 cancer L-SAGE experiments were compared to 1,959 Reference Tags (RT, i.e. tags matching the human genome exactly once. Similarly, tens of millions of tags produced by 7 healthy and 8 cancer Tag-seq experiments were compared to 8,572 RT. For each transcript, SBS frequencies in healthy and cancer cells were statistically tested for equality. Results In the L-SAGE and Tag-seq experiments, 372 and 4,289 transcripts respectively, showed greater SBS frequencies in cancer. Increased SBS frequencies could not be attributed to known Single Nucleotide Polymorphisms (SNP, catalogued somatic mutations or RNA-editing enzymes. Hypothesizing that Single Tags (ST, i.e. tags sequenced only once, were indicators of SBS, we observed that ST proportions were heterogeneously distributed across Embryonic Stem Cells (ESC, healthy differentiated and cancer cells. ESC had the lowest ST proportions, whereas cancer cells had the greatest. Finally, in a series of experiments carried out on a single patient at 1 healthy and 3 consecutive tumor stages, we could show that SBS frequencies increased during cancer progression. Conclusion If the mechanisms generating the base substitutions could be known, increased SBS frequency in transcripts would be a new useful biomarker of cancer. With the reduction of sequencing cost, sequence based whole genome expression profiling could be used to characterize increased SBS frequency in patient’s tumor and aid diagnostic.

  16. Increased frequency of single base substitutions in a population of transcripts expressed in cancer cells

    International Nuclear Information System (INIS)

    Bianchetti, Laurent; Kieffer, David; Féderkeil, Rémi; Poch, Olivier

    2012-01-01

    Single Base Substitutions (SBS) that alter transcripts expressed in cancer originate from somatic mutations. However, recent studies report SBS in transcripts that are not supported by the genomic DNA of tumor cells. We used sequence based whole genome expression profiling, namely Long-SAGE (L-SAGE) and Tag-seq (a combination of L-SAGE and deep sequencing), and computational methods to identify transcripts with greater SBS frequencies in cancer. Millions of tags produced by 40 healthy and 47 cancer L-SAGE experiments were compared to 1,959 Reference Tags (RT), i.e. tags matching the human genome exactly once. Similarly, tens of millions of tags produced by 7 healthy and 8 cancer Tag-seq experiments were compared to 8,572 RT. For each transcript, SBS frequencies in healthy and cancer cells were statistically tested for equality. In the L-SAGE and Tag-seq experiments, 372 and 4,289 transcripts respectively, showed greater SBS frequencies in cancer. Increased SBS frequencies could not be attributed to known Single Nucleotide Polymorphisms (SNP), catalogued somatic mutations or RNA-editing enzymes. Hypothesizing that Single Tags (ST), i.e. tags sequenced only once, were indicators of SBS, we observed that ST proportions were heterogeneously distributed across Embryonic Stem Cells (ESC), healthy differentiated and cancer cells. ESC had the lowest ST proportions, whereas cancer cells had the greatest. Finally, in a series of experiments carried out on a single patient at 1 healthy and 3 consecutive tumor stages, we could show that SBS frequencies increased during cancer progression. If the mechanisms generating the base substitutions could be known, increased SBS frequency in transcripts would be a new useful biomarker of cancer. With the reduction of sequencing cost, sequence based whole genome expression profiling could be used to characterize increased SBS frequency in patient’s tumor and aid diagnostic

  17. The upstream regulatory sequence of the light harvesting complex Lhcf2 gene of the marine diatom Phaeodactylum tricornutum enhances transcription in an orientation- and distance-independent fashion.

    Science.gov (United States)

    Russo, Monia Teresa; Annunziata, Rossella; Sanges, Remo; Ferrante, Maria Immacolata; Falciatore, Angela

    2015-12-01

    Diatoms are a key phytoplankton group in the contemporary ocean, showing extraordinary adaptation capacities to rapidly changing environments. The recent availability of whole genome sequences from representative species has revealed distinct features in their genomes, like novel combinations of genes encoding distinct metabolisms and a significant number of diatom-specific genes. However, the regulatory mechanisms driving diatom gene expression are still largely uncharacterized. Considering the wide variety of fields of study orbiting diatoms, ranging from ecology, evolutionary biology to biotechnology, it is thus essential to increase our understanding of fundamental gene regulatory processes such as transcriptional regulation. To this aim, we explored the functional properties of the 5'-flanking region of the Phaeodatylum tricornutum Lhcf2 gene, encoding a member of the Light Harvesting Complex superfamily and we showed that this region enhances transcription of a GUS reporter gene in an orientation- and distance-independent fashion. This represents the first example of a cis-regulatory sequence with enhancer-like features discovered in diatoms and it is instrumental for the generation of novel genetic tools and diatom exploitation in different areas of study. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. An ABRE promoter sequence is involved in osmotic stress-responsive expression of the DREB2A gene, which encodes a transcription factor regulating drought-inducible genes in Arabidopsis.

    Science.gov (United States)

    Kim, June-Sik; Mizoi, Junya; Yoshida, Takuya; Fujita, Yasunari; Nakajima, Jun; Ohori, Teppei; Todaka, Daisuke; Nakashima, Kazuo; Hirayama, Takashi; Shinozaki, Kazuo; Yamaguchi-Shinozaki, Kazuko

    2011-12-01

    In plants, osmotic stress-responsive transcriptional regulation depends mainly on two major classes of cis-acting elements found in the promoter regions of stress-inducible genes: ABA-responsive elements (ABREs) and dehydration-responsive elements (DREs). ABRE has been shown to perceive ABA-mediated osmotic stress signals, whereas DRE is known to be involved in an ABA-independent pathway. Previously, we reported that the transcription factor DRE-BINDING PROTEIN 2A (DREB2A) regulates DRE-mediated transcription of target genes under osmotic stress conditions in Arabidopsis (Arabidopsis thaliana). However, the transcriptional regulation of DREB2A itself remains largely uncharacterized. To elucidate the transcriptional mechanism associated with the DREB2A gene under osmotic stress conditions, we generated a series of truncated and base-substituted variants of the DREB2A promoter and evaluated their transcriptional activities individually. We found that both ABRE and coupling element 3 (CE3)-like sequences located approximately -100 bp from the transcriptional initiation site are necessary for the dehydration-responsive expression of DREB2A. Coupling our transient expression analyses with yeast one-hybrid and chromatin immunoprecipitation (ChIP) assays indicated that the ABRE-BINDING PROTEIN 1 (AREB1), AREB2 and ABRE-BINDING FACTOR 3 (ABF3) bZIP transcription factors can bind to and activate the DREB2A promoter in an ABRE-dependent manner. Exogenous ABA application induced only a modest accumulation of the DREB2A transcript when compared with the osmotic stress treatment. However, the osmotic stress-induced DREB2A expression was found to be markedly impaired in several ABA-deficient and ABA-insensitive mutants. These results suggest that in addition to an ABA-independent pathway, the ABA-dependent pathway plays a positive role in the osmotic stress-responsive expression of DREB2A.

  19. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76......(s) of tetranectin. The sequence analysis revealed a difference in both sequence and size of the noncoding regions between mouse and human cDNAs. Northern analysis of the various tissues from mouse, rat, and cow showed the major transcript(s) to be approximately 1 kb, which is similar in size to that observed...

  20. CONREAL web server: identification and visualization of conserved transcription factor binding sites

    NARCIS (Netherlands)

    Berezikov, E.; Guryev, V.; Cuppen, E.

    2005-01-01

    The use of orthologous sequences and phylogenetic footprinting approaches have become popular for the recognition of conserved and potentially functional sequences. Several algorithms have been developed for the identification of conserved transcription factor binding sites (TFBSs), which are

  1. RNA-Seq for enrichment and analysis of IRF5 transcript expression in SLE.

    Directory of Open Access Journals (Sweden)

    Rivka C Stone

    Full Text Available Polymorphisms in the interferon regulatory factor 5 (IRF5 gene have been consistently replicated and shown to confer risk for or protection from the development of systemic lupus erythematosus (SLE. IRF5 expression is significantly upregulated in SLE patients and upregulation associates with IRF5-SLE risk haplotypes. IRF5 alternative splicing has also been shown to be elevated in SLE patients. Given that human IRF5 exists as multiple alternatively spliced transcripts with distinct function(s, it is important to determine whether the IRF5 transcript profile expressed in healthy donor immune cells is different from that expressed in SLE patients. Moreover, it is not currently known whether an IRF5-SLE risk haplotype defines the profile of IRF5 transcripts expressed. Using standard molecular cloning techniques, we identified and isolated 14 new differentially spliced IRF5 transcript variants from purified monocytes of healthy donors and SLE patients to generate an IRF5 variant transcriptome. Next-generation sequencing was then used to perform in-depth and quantitative analysis of full-length IRF5 transcript expression in primary immune cells of SLE patients and healthy donors by next-generation sequencing. Evidence for additional alternatively spliced transcripts was obtained from de novo junction discovery. Data from these studies support the overall complexity of IRF5 alternative splicing in SLE. Results from next-generation sequencing correlated with cloning and gave similar abundance rankings in SLE patients thus supporting the use of this new technology for in-depth single gene transcript profiling. Results from this study provide the first proof that 1 SLE patients express an IRF5 transcript signature that is distinct from healthy donors, 2 an IRF5-SLE risk haplotype defines the top four most abundant IRF5 transcripts expressed in SLE patients, and 3 an IRF5 transcript signature enables clustering of SLE patients with the H2 risk haplotype.

  2. Transcription of human 7S K DNA in vitro and in vivo is exclusively controlled by an upstream promoter

    Energy Technology Data Exchange (ETDEWEB)

    Kleinert, H.; Benecke, B.J.

    1988-02-25

    The authors have analyzed the transcription of a recently isolated human 7S K RNA gene in vitro and in vivo. In contrast to hitherto characterized class III genes (genes transcribed by RNA polymerase III), the coding sequence of this gene is not required for faithful and efficient transcription by RNA polymerase III. In fact, a procaryotic vector DNA sequence was efficiently transcribed by RNA polymerase III under the control of the 7S K RNA gene upstream sequence in vitro and in vivo. S/sub 1/-nuclease protection analyses confirmed that the 7S K 5'flanking sequence was sufficient for accurate transcription initiation. These data demonstrate that 7S K DNA represents a novel class III gene, the promoter elements of which are located outside the coding sequence.

  3. CRISPR families of the crenarchaeal genus Sulfolobus: bidirectional transcription and dynamic properties

    DEFF Research Database (Denmark)

    Lillestøl, Reidun K; Shah, Shiraz Ali; Brügger, Kim

    2009-01-01

    Summary CRISPRs of Sulfolobus fall into three main families based on their repeats, leader regions, associated cas genes, and putative recognition sequences on viruses and plasmids. Spacer sequence matches to different viruses and plasmids of the Sulfolobales revealed some bias particularly...... for family III CRISPRs. Transcription occurs on both strands of the five repeat-clusters of Sulfolobus acidocaldarius and a repeat-cluster of the conjugative plasmid pKEF9. Leader strand transcripts cover whole repeat-clusters and are processed mainly from the 3'-end, within repeats, yielding heterogeneous...

  4. Site-Specific Incorporation of Functional Components into RNA by an Unnatural Base Pair Transcription System

    Directory of Open Access Journals (Sweden)

    Rie Kawai

    2012-03-01

    Full Text Available Toward the expansion of the genetic alphabet, an unnatural base pair between 7-(2-thienylimidazo[4,5-b]pyridine (Ds and pyrrole-2-carbaldehyde (Pa functions as a third base pair in replication and transcription, and provides a useful tool for the site-specific, enzymatic incorporation of functional components into nucleic acids. We have synthesized several modified-Pa substrates, such as alkylamino-, biotin-, TAMRA-, FAM-, and digoxigenin-linked PaTPs, and examined their transcription by T7 RNA polymerase using Ds-containing DNA templates with various sequences. The Pa substrates modified with relatively small functional groups, such as alkylamino and biotin, were efficiently incorporated into RNA transcripts at the internal positions, except for those less than 10 bases from the 3′-terminus. We found that the efficient incorporation into a position close to the 3′-terminus of a transcript depended on the natural base contexts neighboring the unnatural base, and that pyrimidine-Ds-pyrimidine sequences in templates were generally favorable, relative to purine-Ds-purine sequences. The unnatural base pair transcription system provides a method for the site-specific functionalization of large RNA molecules.

  5. YY1 binding association with sex-biased transcription revealed through X-linked transcript levels and allelic binding analyses.

    Science.gov (United States)

    Chen, Chih-Yu; Shi, Wenqiang; Balaton, Bradley P; Matthews, Allison M; Li, Yifeng; Arenillas, David J; Mathelier, Anthony; Itoh, Masayoshi; Kawaji, Hideya; Lassmann, Timo; Hayashizaki, Yoshihide; Carninci, Piero; Forrest, Alistair R R; Brown, Carolyn J; Wasserman, Wyeth W

    2016-11-18

    Sex differences in susceptibility and progression have been reported in numerous diseases. Female cells have two copies of the X chromosome with X-chromosome inactivation imparting mono-allelic gene silencing for dosage compensation. However, a subset of genes, named escapees, escape silencing and are transcribed bi-allelically resulting in sexual dimorphism. Here we conducted in silico analyses of the sexes using human datasets to gain perspectives into such regulation. We identified transcription start sites of escapees (escTSSs) based on higher transcription levels in female cells using FANTOM5 CAGE data. Significant over-representations of YY1 transcription factor binding motif and ChIP-seq peaks around escTSSs highlighted its positive association with escapees. Furthermore, YY1 occupancy is significantly biased towards the inactive X (Xi) at long non-coding RNA loci that are frequent contacts of Xi-specific superloops. Our study suggests a role for YY1 in transcriptional activity on Xi in general through sequence-specific binding, and its involvement at superloop anchors.

  6. Differentiation among isolates of prunus necrotic ringspot virus by transcript conformation polymorphism.

    Science.gov (United States)

    Rosner, A; Maslenin, L; Spiegel, S

    1998-09-01

    A method based on differences in electrophoretic mobility of RNA transcripts made from polymerase chain reaction (PCR) products was used for differentiation among virus isolates. A T7 RNA polymerase promoter was attached to amplified prunus necrotic ringspot virus (PNRSV) sequences by PCR. The PCR products then served as a template for transcription. Single-stranded transcripts originated from different PNRSV isolates varied in electrophoretic mobility in polyacrylamide gels, presumably because of transcript conformation polymorphism (TCP). This procedure was applied for the differentiation of PNRSV isolates.

  7. Transcriptional and post-transcriptional regulation of pst2 operon expression in Vibrio cholerae O1.

    Science.gov (United States)

    da C Leite, Daniel M; Barbosa, Livia C; Mantuano, Nathalia; Goulart, Carolina L; Veríssimo da Costa, Giovani C; Bisch, Paulo M; von Krüger, Wanda M A

    2017-07-01

    One of the most abundant proteins in V. cholerae O1 cells grown under inorganic phosphate (Pi) limitation is PstS, the periplasmic Pi-binding component of the high-affinity Pi transport system Pst2 (PstSCAB), encoded in pst2 operon (pstS-pstC2-pstA2-pstB2). Besides its role in Pi uptake, Pst2 has been also associated with V. cholerae virulence. However, the mechanisms regulating pst2 expression and the non-stoichiometric production of the Pst2 components under Pi-limitation are unknown. A computational-experimental approach was used to elucidate the regulatory mechanisms behind pst2 expression in V. cholerae O1. Bioinformatics analysis of pst2 operon nucleotide sequence revealed start codons for pstS and pstC genes distinct from those originally annotated, a regulatory region upstream pstS containing potential PhoB-binding sites and a pstS-pstC intergenic region longer than predicted. Analysis of nucleotide sequence between pstS-pstC revealed inverted repeats able to form stem-loop structures followed by a potential RNAse E-cleavage site. Another putative RNase E recognition site was identified within the pstA-pstB intergenic sequence. In silico predictions of pst2 operon expression regulation were subsequently tested using cells grown under Pi limitation by promoter-lacZ fusion, gel electrophoresis mobility shift assay and quantitative RT-PCR. The experimental and in silico results matched very well and led us to propose a pst2 promoter sequence upstream of pstS gene distinct from the previously annotated. Furthermore, V. cholerae O1 pst2 operon transcription is PhoB-dependent and generates a polycistronic mRNA molecule that is rapidly processed into minor transcripts of distinct stabilities. The most stable was the pstS-encoding mRNA, which correlates with PstS higher levels relative to other Pst2 components in Pi-starved cells. The relatively higher stability of pstS and pstB transcripts seems to rely on the secondary structures at their 3' untranslated regions

  8. Transcription of repetitive DNA in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K; Chaudhuri, R K

    1975-01-01

    Repeated DNA sequences of Neurospora crassa were isolated and characterized. Approximately 10 to 12 percent of N. crassa DNA sequence were repeated, of which 7.3 percent were found to be transcribed in mid-log phase of mycelial growth as measured by DNA:RNA hybridization. It is suggested that part of repetitive DNA transcripts in N. crassa were mitochondrial and part were nuclear DNA. Most of the nuclear repeated DNAs, however, code for rRNA and tRNA in N. crassa. (auth)

  9. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    Science.gov (United States)

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  10. Polyphonic Piano Transcription with a Note-Based Music Language Model

    Directory of Open Access Journals (Sweden)

    Qi Wang

    2018-03-01

    Full Text Available This paper proposes a note-based music language model (MLM for improving note-level polyphonic piano transcription. The MLM is based on the recurrent structure, which could model the temporal correlations between notes in music sequences. To combine the outputs of the note-based MLM and acoustic model directly, an integrated architecture is adopted in this paper. We also propose an inference algorithm, in which the note-based MLM is used to predict notes at the blank onsets in the thresholding transcription results. The experimental results show that the proposed inference algorithm improves the performance of note-level transcription. We also observe that the combination of the restricted Boltzmann machine (RBM and recurrent structure outperforms a single recurrent neural network (RNN or long short-term memory network (LSTM in modeling the high-dimensional note sequences. Among all the MLMs, LSTM-RBM helps the system yield the best results on all evaluation metrics regardless of the performance of acoustic models.

  11. Identification of Cis-Acting Promoter Elements in Cold- and Dehydration-Induced Transcriptional Pathways in Arabidopsis, Rice, and Soybean

    Science.gov (United States)

    Maruyama, Kyonoshin; Todaka, Daisuke; Mizoi, Junya; Yoshida, Takuya; Kidokoro, Satoshi; Matsukura, Satoko; Takasaki, Hironori; Sakurai, Tetsuya; Yamamoto, Yoshiharu Y.; Yoshiwara, Kyouko; Kojima, Mikiko; Sakakibara, Hitoshi; Shinozaki, Kazuo; Yamaguchi-Shinozaki, Kazuko

    2012-01-01

    The genomes of three plants, Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and soybean (Glycine max), have been sequenced, and their many genes and promoters have been predicted. In Arabidopsis, cis-acting promoter elements involved in cold- and dehydration-responsive gene expression have been extensively analysed; however, the characteristics of such cis-acting promoter sequences in cold- and dehydration-inducible genes of rice and soybean remain to be clarified. In this study, we performed microarray analyses using the three species, and compared characteristics of identified cold- and dehydration-inducible genes. Transcription profiles of the cold- and dehydration-responsive genes were similar among these three species, showing representative upregulated (dehydrin/LEA) and downregulated (photosynthesis-related) genes. All (46 = 4096) hexamer sequences in the promoters of the three species were investigated, revealing the frequency of conserved sequences in cold- and dehydration-inducible promoters. A core sequence of the abscisic acid-responsive element (ABRE) was the most conserved in dehydration-inducible promoters of all three species, suggesting that transcriptional regulation for dehydration-inducible genes is similar among these three species, with the ABRE-dependent transcriptional pathway. In contrast, for cold-inducible promoters, the conserved hexamer sequences were diversified among these three species, suggesting the existence of diverse transcriptional regulatory pathways for cold-inducible genes among the species. PMID:22184637

  12. Evolution of transcriptional enhancers and animal diversity

    Science.gov (United States)

    Rubinstein, Marcelo; de Souza, Flávio S. J.

    2013-01-01

    Deciphering the genetic bases that drive animal diversity is one of the major challenges of modern biology. Although four decades ago it was proposed that animal evolution was mainly driven by changes in cis-regulatory DNA elements controlling gene expression rather than in protein-coding sequences, only now are powerful bioinformatics and experimental approaches available to accelerate studies into how the evolution of transcriptional enhancers contributes to novel forms and functions. In the introduction to this Theme Issue, we start by defining the general properties of transcriptional enhancers, such as modularity and the coexistence of tight sequence conservation with transcription factor-binding site shuffling as different mechanisms that maintain the enhancer grammar over evolutionary time. We discuss past and current methods used to identify cell-type-specific enhancers and provide examples of how enhancers originate de novo, change and are lost in particular lineages. We then focus in the central part of this Theme Issue on analysing examples of how the molecular evolution of enhancers may change form and function. Throughout this introduction, we present the main findings of the articles, reviews and perspectives contributed to this Theme Issue that together illustrate some of the great advances and current frontiers in the field. PMID:24218630

  13. TAF(II)250: a transcription toolbox.

    Science.gov (United States)

    Wassarman, D A; Sauer, F

    2001-08-01

    Activation of RNA-polymerase-II-dependent transcription involves conversion of signals provided by gene-specific activator proteins into the synthesis of messenger RNA. This conversion requires dynamic structural changes in chromatin and assembly of general transcription factors (GTFs) and RNA polymerase II at core promoter sequence elements surrounding the transcription start site of genes. One hallmark of transcriptional activation is the interaction of DNA-bound activators with coactivators such as the TATA-box binding protein (TBP)-associated factors (TAF(II)s) within the GTF TFIID. TAF(II)250 possesses a variety of activities that are likely to contribute to the initial steps of RNA polymerase II transcription. TAF(II)250 is a scaffold for assembly of other TAF(II)s and TBP into TFIID, TAF(II)250 binds activators to recruit TFIID to particular promoters, TAF(II)250 regulates binding of TBP to DNA, TAF(II)250 binds core promoter initiator elements, TAF(II)250 binds acetylated lysine residues in core histones, and TAF(II)250 possesses protein kinase, ubiquitin-activating/conjugating and acetylase activities that modify histones and GTFs. We speculate that these activities achieve two goals--(1) they aid in positioning and stabilizing TFIID at particular promoters, and (2) they alter chromatin structure at the promoter to allow assembly of GTFs--and we propose a model for how TAF(II)250 converts activation signals into active transcription.

  14. The transcriptional regulatory network mediated by banana (Musa acuminata) dehydration-responsive element binding (MaDREB) transcription factors in fruit ripening.

    Science.gov (United States)

    Kuang, Jian-Fei; Chen, Jian-Ye; Liu, Xun-Cheng; Han, Yan-Chao; Xiao, Yun-Yi; Shan, Wei; Tang, Yang; Wu, Ke-Qiang; He, Jun-Xian; Lu, Wang-Jin

    2017-04-01

    Fruit ripening is a complex, genetically programmed process involving the action of critical transcription factors (TFs). Despite the established significance of dehydration-responsive element binding (DREB) TFs in plant abiotic stress responses, the involvement of DREBs in fruit ripening is yet to be determined. Here, we identified four genes encoding ripening-regulated DREB TFs in banana (Musa acuminata), MaDREB1, MaDREB2, MaDREB3, and MaDREB4, and demonstrated that they play regulatory roles in fruit ripening. We showed that MaDREB1-MaDREB4 are nucleus-localized, induced by ethylene and encompass transcriptional activation activities. We performed a genome-wide chromatin immunoprecipitation and high-throughput sequencing (ChIP-Seq) experiment for MaDREB2 and identified 697 genomic regions as potential targets of MaDREB2. MaDREB2 binds to hundreds of loci with diverse functions and its binding sites are distributed in the promoter regions proximal to the transcriptional start site (TSS). Most of the MaDREB2-binding targets contain the conserved (A/G)CC(G/C)AC motif and MaDREB2 appears to directly regulate the expression of a number of genes involved in fruit ripening. In combination with transcriptome profiling (RNA sequencing) data, our results indicate that MaDREB2 may serve as both transcriptional activator and repressor during banana fruit ripening. In conclusion, our study suggests a hierarchical regulatory model of fruit ripening in banana and that the MaDREB TFs may act as transcriptional regulators in the regulatory network. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.

  15. Forced selection of a human immunodeficiency virus type 1 variant that uses a non-self tRNA primer for reverse transcription: Involvement of viral RNA sequences and the reverse transcriptase enzyme

    NARCIS (Netherlands)

    Abbink, Truus E. M.; Beerens, Nancy; Berkhout, Ben

    2004-01-01

    Human immunodeficiency virus type 1 uses the tRNA(3)(Lys) molecule as a selective primer for reverse transcription. This primer specificity is imposed by sequence complementarity between the tRNA primer and two motifs in the viral RNA genome: the primer-binding site (PBS) and the primer activation

  16. Targeted genome regulation via synthetic programmable transcriptional regulators

    KAUST Repository

    Piatek, Agnieszka Anna; Mahfouz, Magdy M.

    2016-01-01

    genes in linear and interacting pathways in a native context. Modular DNA-binding domains from zinc fingers (ZFs) and transcriptional activator-like proteins (TALE) are amenable to bioengineering to bind DNA target sequences of interest. As a result, ZF

  17. A deeper look into transcription regulatory code by preferred pair distance templates for transcription factor binding sites

    KAUST Repository

    Kulakovskiy, Ivan V.

    2011-08-18

    Motivation: Modern experimental methods provide substantial information on protein-DNA recognition. Studying arrangements of transcription factor binding sites (TFBSs) of interacting transcription factors (TFs) advances understanding of the transcription regulatory code. Results: We constructed binding motifs for TFs forming a complex with HIF-1α at the erythropoietin 3\\'-enhancer. Corresponding TFBSs were predicted in the segments around transcription start sites (TSSs) of all human genes. Using the genome-wide set of regulatory regions, we observed several strongly preferred distances between hypoxia-responsive element (HRE) and binding sites of a particular cofactor protein. The set of preferred distances was called as a preferred pair distance template (PPDT). PPDT dramatically depended on the TF and orientation of its binding sites relative to HRE. PPDT evaluated from the genome-wide set of regulatory sequences was used to detect significant PPDT-consistent binding site pairs in regulatory regions of hypoxia-responsive genes. We believe PPDT can help to reveal the layout of eukaryotic regulatory segments. © The Author 2011. Published by Oxford University Press. All rights reserved.

  18. Genomic dissection of conserved transcriptional regulation in intestinal epithelial cells.

    Directory of Open Access Journals (Sweden)

    Colin R Lickwar

    2017-08-01

    Full Text Available The intestinal epithelium serves critical physiologic functions that are shared among all vertebrates. However, it is unknown how the transcriptional regulatory mechanisms underlying these functions have changed over the course of vertebrate evolution. We generated genome-wide mRNA and accessible chromatin data from adult intestinal epithelial cells (IECs in zebrafish, stickleback, mouse, and human species to determine if conserved IEC functions are achieved through common transcriptional regulation. We found evidence for substantial common regulation and conservation of gene expression regionally along the length of the intestine from fish to mammals and identified a core set of genes comprising a vertebrate IEC signature. We also identified transcriptional start sites and other putative regulatory regions that are differentially accessible in IECs in all 4 species. Although these sites rarely showed sequence conservation from fish to mammals, surprisingly, they drove highly conserved IEC expression in a zebrafish reporter assay. Common putative transcription factor binding sites (TFBS found at these sites in multiple species indicate that sequence conservation alone is insufficient to identify much of the functionally conserved IEC regulatory information. Among the rare, highly sequence-conserved, IEC-specific regulatory regions, we discovered an ancient enhancer upstream from her6/HES1 that is active in a distinct population of Notch-positive cells in the intestinal epithelium. Together, these results show how combining accessible chromatin and mRNA datasets with TFBS prediction and in vivo reporter assays can reveal tissue-specific regulatory information conserved across 420 million years of vertebrate evolution. We define an IEC transcriptional regulatory network that is shared between fish and mammals and establish an experimental platform for studying how evolutionarily distilled regulatory information commonly controls IEC development

  19. Transcription and replication result in distinct epigenetic marks following repression of early gene expression

    OpenAIRE

    Kallestad, Les; Woods, Emily; Christensen, Kendra; Gefroh, Amanda; Balakrishnan, Lata; Milavetz, Barry

    2013-01-01

    Simian Virus 40 (SV40) early transcription is repressed when the product of early transcription, T-antigen, binds to its cognate regulatory sequence, Site I, in the promoter of the SV40 minichromosome. Because SV40 minichromosomes undergo replication and transcription potentially repression could occur during active transcription or during DNA replication. Since repression is frequently epigenetically marked by the introduction of specific forms of methylated histone H3, we characterized th...

  20. Low nucleosome occupancy is encoded around functional human transcription factor binding sites

    Directory of Open Access Journals (Sweden)

    Daenen Floris

    2008-07-01

    Full Text Available Abstract Background Transcriptional regulation of genes in eukaryotes is achieved by the interactions of multiple transcription factors with arrays of transcription factor binding sites (TFBSs on DNA and with each other. Identification of these TFBSs is an essential step in our understanding of gene regulatory networks, but computational prediction of TFBSs with either consensus or commonly used stochastic models such as Position-Specific Scoring Matrices (PSSMs results in an unacceptably high number of hits consisting of a few true functional binding sites and numerous false non-functional binding sites. This is due to the inability of the models to incorporate higher order properties of sequences including sequences surrounding TFBSs and influencing the positioning of nucleosomes and/or the interactions that might occur between transcription factors. Results Significant improvement can be expected through the development of a new framework for the modeling and prediction of TFBSs that considers explicitly these higher order sequence properties. It would be particularly interesting to include in the new modeling framework the information present in the nucleosome positioning sequences (NPSs surrounding TFBSs, as it can be hypothesized that genomes use this information to encode the formation of stable nucleosomes over non-functional sites, while functional sites have a more open chromatin configuration. In this report we evaluate the usefulness of the latter feature by comparing the nucleosome occupancy probabilities around experimentally verified human TFBSs with the nucleosome occupancy probabilities around false positive TFBSs and in random sequences. Conclusion We present evidence that nucleosome occupancy is remarkably lower around true functional human TFBSs as compared to non-functional human TFBSs, which supports the use of this feature to improve current TFBS prediction approaches in higher eukaryotes.

  1. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    High-throughput sequencing has the potential to answer many of the big questions in biology and medicine. It can be used to determine the ancestry of species, to chart complex ecosystems and to understand and diagnose disease. However, going from raw sequencing data to biological or medical insig....... By estimating the genotypes on a set of candidate variants obtained from both a standard mapping-based approach as well as de novo assemblies, we are able to find considerably more structural variation than previous studies...... for reconstructing transcript sequences from RNA sequencing data. The method is based on a novel sparse prior distribution over transcript abundances and is markedly more accurate than existing approaches. The second chapter describes a new method for calling genotypes from a fixed set of candidate variants....... The method queries the reads using a graph representation of the variants and hereby mitigates the reference-bias that characterise standard genotyping methods. In the last chapter, we apply this method to call the genotypes of 50 deeply sequencing parent-offspring trios from the GenomeDenmark project...

  2. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields

    KAUST Repository

    Shao, Mingfu; Ma, Jianzhu; Wang, Sheng

    2017-01-01

    Motivation: Reconstructing the full- length expressed transcripts (a. k. a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak.

  3. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields

    KAUST Repository

    Shao, Mingfu

    2017-04-20

    Motivation: Reconstructing the full- length expressed transcripts (a. k. a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak.

  4. Mammalian transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes and are predicted to act as transcriptional activator hubs.

    Science.gov (United States)

    Joshi, Anagha

    2014-12-30

    Transcriptional hotspots are defined as genomic regions bound by multiple factors. They have been identified recently as cell type specific enhancers regulating developmentally essential genes in many species such as worm, fly and humans. The in-depth analysis of hotspots across multiple cell types in same species still remains to be explored and can bring new biological insights. We therefore collected 108 transcription-related factor (TF) ChIP sequencing data sets in ten murine cell types and classified the peaks in each cell type in three groups according to binding occupancy as singletons (low-occupancy), combinatorials (mid-occupancy) and hotspots (high-occupancy). The peaks in the three groups clustered largely according to the occupancy, suggesting priming of genomic loci for mid occupancy irrespective of cell type. We then characterized hotspots for diverse structural functional properties. The genes neighbouring hotspots had a small overlap with hotspot genes in other cell types and were highly enriched for cell type specific function. Hotspots were enriched for sequence motifs of key TFs in that cell type and more than 90% of hotspots were occupied by pioneering factors. Though we did not find any sequence signature in the three groups, the H3K4me1 binding profile had bimodal peaks at hotspots, distinguishing hotspots from mono-modal H3K4me1 singletons. In ES cells, differentially expressed genes after perturbation of activators were enriched for hotspot genes suggesting hotspots primarily act as transcriptional activator hubs. Finally, we proposed that ES hotspots might be under control of SetDB1 and not DNMT for silencing. Transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes. In ES cells, they are predicted to act as transcriptional activator hubs and might be under SetDB1 control for silencing.

  5. WRKY transcription factor superfamily: Structure, origin and functions

    African Journals Online (AJOL)

    terminal ends contain the WRKYGQR amino acid sequence and a zinc-finger motif. WRKY transcription factors can regulate the expression of target genes that contain the W-box elements (C/T)TGAC(C/T) in the promoter regions by specifically ...

  6. Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria.

    Directory of Open Access Journals (Sweden)

    Oscar Harari

    2010-07-01

    Full Text Available Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg(2+ homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs (i.e., termed submotifs using a machine learning method inspired by the "Divide & Conquer" strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target

  7. Insulin increases transcription of rat gene 33 through cis-acting elements in 5[prime]-flanking DNA

    Energy Technology Data Exchange (ETDEWEB)

    Cadilla, C.; Isham, K.R.; Lee, K.L.; Ch' ang, L.Y.; Kenney, F.T. (Oak Ridge National Lab., TN (United States)); Johnson, A.C. (National Cancer Institute, Bethesda, MD (United States). Lab. of Molecular Biology)

    1992-01-01

    Gene 33 is a multihormonally-regulated rat gene whose transcription is rapidly and markedly enhanced by insulin in liver and cultured hepatoma cells. To examine the mechanism by which insulin regulates transcription, the authors have constructed chimeric plasmids in which expression of the bacterial cat gene, encoding chloramphenicol acetyltransferase (CAT), is governed by gene 33 promoter elements and contiguous sequence in DNA flanking the transcription start point (tsp). When transfected into H4IIE hepatoma cells, these constructs gave rise to stably transformed cell lines producing the bacterial CAT enzyme. This expression was increased by insulin treatment in a fashion resembling the effect of this hormone on transcription of the native gene. In vitro transcription assays in nuclear extracts also revealed increased transcription of the chimeric plasmids when the extracts were prepared from insulin-treated rat hepatoma cells. The results demonstrate that induction by insulin is mediated by cis-acting nucleotide sequences located between bp [minus]480 to +27 relative to the tsp.

  8. Novel splice mutation in microthalmia-associated transcription factor in Waardenburg Syndrome.

    Science.gov (United States)

    Brenner, Laura; Burke, Kelly; Leduc, Charles A; Guha, Saurav; Guo, Jiancheng; Chung, Wendy K

    2011-01-01

    Waardenburg Syndrome (WS) is a syndromic form of hearing loss associated with mutations in six different genes. We identified a large family with WS that had previously undergone clinical testing, with no reported pathogenic mutation. Using linkage analysis, a region on 3p14.1 with an LOD score of 6.6 was identified. Microthalmia-Associated Transcription Factor, a gene known to cause WS, is located within this region of linkage. Sequencing of Microthalmia-Associated Transcription Factor demonstrated a c.1212 G>A synonymous variant that segregated with the WS in the family and was predicted to cause a novel splicing site that was confirmed with expression analysis of the mRNA. This case illustrates the need to computationally analyze novel synonymous sequence variants for possible effects on splicing to maximize the clinical sensitivity of sequence-based genetic testing.

  9. Mechanisms of transcriptional repression by histone lysine methylation

    DEFF Research Database (Denmark)

    Hublitz, Philip; Albert, Mareike; Peters, Antoine H F M

    2009-01-01

    . In this report, we review the recent literature to deduce mechanisms underlying Polycomb and H3K9 methylation mediated repression, and describe the functional interplay with activating H3K4 methylation. We summarize recent data that indicate a close relationship between GC density of promoter sequences......, transcription factor binding and the antagonizing activities of distinct epigenetic regulators such as histone methyltransferases (HMTs) and histone demethylases (HDMs). Subsequently, we compare chromatin signatures associated with different types of transcriptional outcomes from stable repression to highly...

  10. The evolution of WRKY transcription factors.

    Science.gov (United States)

    Rinerson, Charles I; Rabara, Roel C; Tripathi, Prateek; Shen, Qingxi J; Rushton, Paul J

    2015-02-27

    The availability of increasing numbers of sequenced genomes has necessitated a re-evaluation of the evolution of the WRKY transcription factor family. Modern day plants descended from a charophyte green alga that colonized the land between 430 and 470 million years ago. The first charophyte genome sequence from Klebsormidium flaccidum filled a gap in the available genome sequences in the plant kingdom between unicellular green algae that typically have 1-3 WRKY genes and mosses that contain 30-40. WRKY genes have been previously found in non-plant species but their occurrence has been difficult to explain. Only two WRKY genes are present in the Klebsormidium flaccidum genome and the presence of a Group IIb gene was unexpected because it had previously been thought that Group IIb WRKY genes first appeared in mosses. We found WRKY transcription factor genes outside of the plant lineage in some diplomonads, social amoebae, fungi incertae sedis, and amoebozoa. This patchy distribution suggests that lateral gene transfer is responsible. These lateral gene transfer events appear to pre-date the formation of the WRKY groups in flowering plants. Flowering plants contain proteins with domains typical for both resistance (R) proteins and WRKY transcription factors. R protein-WRKY genes have evolved numerous times in flowering plants, each type being restricted to specific flowering plant lineages. These chimeric proteins contain not only novel combinations of protein domains but also novel combinations and numbers of WRKY domains. Once formed, R protein WRKY genes may combine different components of signalling pathways that may either create new diversity in signalling or accelerate signalling by short circuiting signalling pathways. We propose that the evolution of WRKY transcription factors includes early lateral gene transfers to non-plant organisms and the occurrence of algal WRKY genes that have no counterparts in flowering plants. We propose two alternative hypotheses

  11. Molecular cloning of transcripts induced by UV-radiation in rodent cells

    International Nuclear Information System (INIS)

    Fornace, A.J. Jr.; Mitchell, J.B.

    1987-01-01

    Several inducible DNA repair genes have been well characterized in bacteria. In eukaryotes including mammalian cells, there is increasing evidence that similar events may occur. Recently, the authors have shown that hybridization subtraction can be used to enrich for sequences induced only several fold by a particular cell treatment such as heat shock. Chinese hamster V79 cells were UV-irradiated with 17 Jm/sup -2/ and cDNA was synthesized from the polyadenylated (poly A) RNA. This ''UV'' cDNA was hybridized with a 3 fold excess of polyA RNA from unirradiated cells and the nonhybridizing cDNA was isolated. With this approach, UV-induced sequences were enriched over 20 fold. This enriched cDNA was cloned into a high copy number plasmid and a cDNA library was constructed. By RNA dot blot and northern analysis, 42 clones from this library were found to represent transcripts induced 3 to 25 fold by UV. The most common isolates were found to be metallothionein transcripts by DNA sequencing. The metallothionein transcripts were found to be induced 10 to 25 fold by UV with maximum induction at 4-8 h after 10 Jm/sup -2/. A similar approach was also used with a Chinese hamster ovary line which does not express metallothionein and multiple clones were isolated which represented transcripts induced 3-15 fold by UV. Except for the metallothionein clones, the other Chinese hamster cDNA clones have not been identified, but it is probable that the protein products of at least some of these transcripts play a role in the cellular response to UV damage

  12. Simplified Method for Predicting a Functional Class of Proteins in Transcription Factor Complexes

    KAUST Repository

    Piatek, Marek J.

    2013-07-12

    Background:Initiation of transcription is essential for most of the cellular responses to environmental conditions and for cell and tissue specificity. This process is regulated through numerous proteins, their ligands and mutual interactions, as well as interactions with DNA. The key such regulatory proteins are transcription factors (TFs) and transcription co-factors (TcoFs). TcoFs are important since they modulate the transcription initiation process through interaction with TFs. In eukaryotes, transcription requires that TFs form different protein complexes with various nuclear proteins. To better understand transcription regulation, it is important to know the functional class of proteins interacting with TFs during transcription initiation. Such information is not fully available, since not all proteins that act as TFs or TcoFs are yet annotated as such, due to generally partial functional annotation of proteins. In this study we have developed a method to predict, using only sequence composition of the interacting proteins, the functional class of human TF binding partners to be (i) TF, (ii) TcoF, or (iii) other nuclear protein. This allows for complementing the annotation of the currently known pool of nuclear proteins. Since only the knowledge of protein sequences is required in addition to protein interaction, the method should be easily applicable to many species.Results:Based on experimentally validated interactions between human TFs with different TFs, TcoFs and other nuclear proteins, our two classification systems (implemented as a web-based application) achieve high accuracies in distinguishing TFs and TcoFs from other nuclear proteins, and TFs from TcoFs respectively.Conclusion:As demonstrated, given the fact that two proteins are capable of forming direct physical interactions and using only information about their sequence composition, we have developed a completely new method for predicting a functional class of TF interacting protein partners

  13. DNA residence time is a regulatory factor of transcription repression

    Science.gov (United States)

    Clauß, Karen; Popp, Achim P.; Schulze, Lena; Hettich, Johannes; Reisser, Matthias; Escoter Torres, Laura; Uhlenhaut, N. Henriette

    2017-01-01

    Abstract Transcription comprises a highly regulated sequence of intrinsically stochastic processes, resulting in bursts of transcription intermitted by quiescence. In transcription activation or repression, a transcription factor binds dynamically to DNA, with a residence time unique to each factor. Whether the DNA residence time is important in the transcription process is unclear. Here, we designed a series of transcription repressors differing in their DNA residence time by utilizing the modular DNA binding domain of transcription activator-like effectors (TALEs) and varying the number of nucleotide-recognizing repeat domains. We characterized the DNA residence times of our repressors in living cells using single molecule tracking. The residence times depended non-linearly on the number of repeat domains and differed by more than a factor of six. The factors provoked a residence time-dependent decrease in transcript level of the glucocorticoid receptor-activated gene SGK1. Down regulation of transcription was due to a lower burst frequency in the presence of long binding repressors and is in accordance with a model of competitive inhibition of endogenous activator binding. Our single molecule experiments reveal transcription factor DNA residence time as a regulatory factor controlling transcription repression and establish TALE-DNA binding domains as tools for the temporal dissection of transcription regulation. PMID:28977492

  14. Molecular cloning, transcriptional profiling, and subcellular localization of signal transducer and activator of transcription 2 (STAT2) ortholog from rock bream, Oplegnathus fasciatus.

    Science.gov (United States)

    Bathige, S D N K; Umasuthan, Navaneethaiyer; Priyathilaka, Thanthrige Thiunuwan; Thulasitha, William Shanthakumar; Jayasinghe, J D H E; Wan, Qiang; Nam, Bo-Hye; Lee, Jehee

    2017-08-30

    Signal transducer and activator of transcription 2 (STAT2) is a key element that transduces signals from the cell membrane to the nucleus via the type I interferon-signaling pathway. Although the structural and functional aspects of STAT proteins are well studied in mammals, information on teleostean STATs is very limited. In this study, a STAT paralog, which is highly homologous to the STAT2 members, was identified from a commercially important fish species called rock bream and designated as RbSTAT2. The RbSTAT2 gene was characterized at complementary DNA (cDNA) and genomic sequence levels, and was found to possess structural features common with its mammalian counterparts. The complete cDNA sequence was distributed into 24 exons in the genomic sequence. The promoter proximal region was analyzed and found to contain potential transcription factor binding sites to regulate the transcription of RbSTAT2. Phylogenetic studies and comparative genomic structure organization revealed the distinguishable evolution for fish and other vertebrate STAT2 orthologs. Transcriptional quantification was performed by SYBR Green quantitative real-time PCR (qPCR) and the ubiquitous expression of RbSTAT2 transcripts was observed in all tissues analyzed from healthy fish, with a remarkably high expression in blood cells. Significantly (Prock bream irido virus; RBIV), bacterial (Edwardsiella tarda and Streptococcus iniae), and immune stimulants (poly I:C and LPS). Antiviral potential was further confirmed by WST-1 assay, by measuring the viability of rock bream heart cells treated with RBIV. In addition, results of an in vitro challenge experiment signified the influence of rock bream interleukin-10 (RbIL-10) on transcription of RbSTAT2. Subcellular localization studies by transfection of pEGFP-N1/RbSTAT2 into rock bream heart cells revealed that the RbSTAT2 was usually located in the cytoplasm and translocated near to the nucleus upon poly I:C administration. Altogether, these

  15. Transcription Factors Bind Thousands of Active and InactiveRegions in the Drosophila Blastoderm

    Energy Technology Data Exchange (ETDEWEB)

    Li, Xiao-Yong; MacArthur, Stewart; Bourgon, Richard; Nix, David; Pollard, Daniel A.; Iyer, Venky N.; Hechmer, Aaron; Simirenko, Lisa; Stapleton, Mark; Luengo Hendriks, Cris L.; Chu, Hou Cheng; Ogawa, Nobuo; Inwood, William; Sementchenko, Victor; Beaton, Amy; Weiszmann, Richard; Celniker, Susan E.; Knowles, David W.; Gingeras, Tom; Speed, Terence P.; Eisen, Michael B.; Biggin, Mark D.

    2008-01-10

    Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. Here, we use whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over forty well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly-bound regions are not involved in early

  16. Sequencing and transcriptional analysis of the Streptococcus thermophilus histamine biosynthesis gene cluster: factors that affect differential hdcA expression

    DEFF Research Database (Denmark)

    Calles-Enríquez, Marina; Hjort, Benjamin Benn; Andersen, Pia Skov

    2010-01-01

    to produce histamine. The hdc clusters of S. thermophilus CHCC1524 and CHCC6483 were sequenced, and the factors that affect histamine biosynthesis and histidine-decarboxylating gene (hdcA) expression were studied. The hdc cluster began with the hdcA gene, was followed by a transporter (hdcP), and ended...... with the hdcB gene, which is of unknown function. The three genes were orientated in the same direction. The genetic organization of the hdc cluster showed a unique organization among the lactic acid bacterial group and resembled those of Staphylococcus and Clostridium species, thus indicating possible...... acquisition through a horizontal transfer mechanism. Transcriptional analysis of the hdc cluster revealed the existence of a polycistronic mRNA covering the three genes. The histidine-decarboxylating gene (hdcA) of S. thermophilus demonstrated maximum expression during the stationary growth phase, with high...

  17. Proteopedia: 3D Visualization and Annotation of Transcription Factor-DNA Readout Modes

    Science.gov (United States)

    Dantas Machado, Ana Carolina; Saleebyan, Skyler B.; Holmes, Bailey T.; Karelina, Maria; Tam, Julia; Kim, Sharon Y.; Kim, Keziah H.; Dror, Iris; Hodis, Eran; Martz, Eric; Compeau, Patricia A.; Rohs, Remo

    2012-01-01

    3D visualization assists in identifying diverse mechanisms of protein-DNA recognition that can be observed for transcription factors and other DNA binding proteins. We used Proteopedia to illustrate transcription factor-DNA readout modes with a focus on DNA shape, which can be a function of either nucleotide sequence (Hox proteins) or base pairing…

  18. In silico discovery of transcription regulatory elements in Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    Le Roch Karine G

    2008-02-01

    Full Text Available Abstract Background With the sequence of the Plasmodium falciparum genome and several global mRNA and protein life cycle expression profiling projects now completed, elucidating the underlying networks of transcriptional control important for the progression of the parasite life cycle is highly pertinent to the development of new anti-malarials. To date, relatively little is known regarding the specific mechanisms the parasite employs to regulate gene expression at the mRNA level, with studies of the P. falciparum genome sequence having revealed few cis-regulatory elements and associated transcription factors. Although it is possible the parasite may evoke mechanisms of transcriptional control drastically different from those used by other eukaryotic organisms, the extreme AT-rich nature of P. falciparum intergenic regions (~90% AT presents significant challenges to in silico cis-regulatory element discovery. Results We have developed an algorithm called Gene Enrichment Motif Searching (GEMS that uses a hypergeometric-based scoring function and a position-weight matrix optimization routine to identify with high-confidence regulatory elements in the nucleotide-biased and repeat sequence-rich P. falciparum genome. When applied to promoter regions of genes contained within 21 co-expression gene clusters generated from P. falciparum life cycle microarray data using the semi-supervised clustering algorithm Ontology-based Pattern Identification, GEMS identified 34 putative cis-regulatory elements associated with a variety of parasite processes including sexual development, cell invasion, antigenic variation and protein biosynthesis. Among these candidates were novel motifs, as well as many of the elements for which biological experimental evidence already exists in the Plasmodium literature. To provide evidence for the biological relevance of a cell invasion-related element predicted by GEMS, reporter gene and electrophoretic mobility shift assays

  19. Specific transcripts are elevated in Saccharomyces cerevisiae in response to DNA damage

    International Nuclear Information System (INIS)

    McClanahan, T.; McEntee, K.

    1984-01-01

    Differential hybridization has been used to identify genes in Saccharomyces cerevisiae displaying increased transcript levels after treatment of cells with UV irradiation or with the mutagen/carcinogen 4-nitroquinoline-1-oxide (NQO). The authors describe the isolation and characterization of four DNA damage responsive genes obtained from screening ca. 9000 yeast genomic clones. Two of these clones, lambda 78A and pBR178C, contain repetitive elements in the yeast genome as shown by Southern hybridization analysis. Although the genomic hybridization pattern is distinct for each of these two clones, both of these sequences hybridize to large polyadenylated transcripts ca. 5 kilobases in length. Two other DNA damage responsive sequences, pBRA2 and pBR3016B, are single-copy genes and hybridize to 0.5- and 3.2-kilobase transcripts, respectively. Kinetic analysis of the 0.5-kilobase transcript homologous to pBRA2 indicates that the level of this RNA increases more than 15-fold within 20 min after exposure to 4-nitroquinoline-1-oxide. Moreover, the level of this transcript is significantly elevated in cells containing the rad52-1 mutation which are deficient in DNA strand break repair and gene conversion. These results provide some of the first evidence that DNA damage stimulates transcription of specific genes in eucaryotic cells

  20. Comparative analysis of function and interaction of transcription factors in nematodes: Extensive conservation of orthology coupled to rapid sequence evolution

    Directory of Open Access Journals (Sweden)

    Singh Rama S

    2008-08-01

    Full Text Available Abstract Background Much of the morphological diversity in eukaryotes results from differential regulation of gene expression in which transcription factors (TFs play a central role. The nematode Caenorhabditis elegans is an established model organism for the study of the roles of TFs in controlling the spatiotemporal pattern of gene expression. Using the fully sequenced genomes of three Caenorhabditid nematode species as well as genome information from additional more distantly related organisms (fruit fly, mouse, and human we sought to identify orthologous TFs and characterized their patterns of evolution. Results We identified 988 TF genes in C. elegans, and inferred corresponding sets in C. briggsae and C. remanei, containing 995 and 1093 TF genes, respectively. Analysis of the three gene sets revealed 652 3-way reciprocal 'best hit' orthologs (nematode TF set, approximately half of which are zinc finger (ZF-C2H2 and ZF-C4/NHR types and HOX family members. Examination of the TF genes in C. elegans and C. briggsae identified the presence of significant tandem clustering on chromosome V, the majority of which belong to ZF-C4/NHR family. We also found evidence for lineage-specific duplications and rapid evolution of many of the TF genes in the two species. A search of the TFs conserved among nematodes in Drosophila melanogaster, Mus musculus and Homo sapiens revealed 150 reciprocal orthologs, many of which are associated with important biological processes and human diseases. Finally, a comparison of the sequence, gene interactions and function indicates that nematode TFs conserved across phyla exhibit significantly more interactions and are enriched in genes with annotated mutant phenotypes compared to those that lack orthologs in other species. Conclusion Our study represents the first comprehensive genome-wide analysis of TFs across three nematode species and other organisms. The findings indicate substantial conservation of transcription

  1. Specific interactions between transcription factors and the promoter-regulatory region of the human cytomegalovirus major immediate-early gene

    International Nuclear Information System (INIS)

    Ghazal, P.; Lubon, H.; Hennighausen, L.

    1988-01-01

    Repeat sequence motifs as well as unique sequences between nucleotides -150 and -22 of the human cytomegalovirus immediate-early 1 gene interact in vitro with nuclear proteins. The authors show that a transcriptional element between nucleotides -91 and -65 stimulated promoter activity in vivo and in vitro by binding specific cellular transcription factors. Finally, a common sequence motif, (T)TGG/AC, present in 15 of the determined binding sites suggests a particular class of nuclear factors associated with the immediate-early 1 gene

  2. SEASTAR: systematic evaluation of alternative transcription start sites in RNA.

    Science.gov (United States)

    Qin, Zhiyi; Stoilov, Peter; Zhang, Xuegong; Xing, Yi

    2018-05-04

    Alternative first exons diversify the transcriptomes of eukaryotes by producing variants of the 5' Untranslated Regions (5'UTRs) and N-terminal coding sequences. Accurate transcriptome-wide detection of alternative first exons typically requires specialized experimental approaches that are designed to identify the 5' ends of transcripts. We developed a computational pipeline SEASTAR that identifies first exons from RNA-seq data alone then quantifies and compares alternative first exon usage across multiple biological conditions. The exons inferred by SEASTAR coincide with transcription start sites identified directly by CAGE experiments and bear epigenetic hallmarks of active promoters. To determine if differential usage of alternative first exons can yield insights into the mechanism controlling gene expression, we applied SEASTAR to an RNA-seq dataset that tracked the reprogramming of mouse fibroblasts into induced pluripotent stem cells. We observed dynamic temporal changes in the usage of alternative first exons, along with correlated changes in transcription factor expression. Using a combined sequence motif and gene set enrichment analysis we identified N-Myc as a regulator of alternative first exon usage in the pluripotent state. Our results demonstrate that SEASTAR can leverage the available RNA-seq data to gain insights into the control of gene expression and alternative transcript variation in eukaryotic transcriptomes.

  3. In silico and biological survey of transcription-associated proteins implicated in the transcriptional machinery during the erythrocytic development of Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    Bischoff Emmanuel

    2010-01-01

    Full Text Available Abstract Background Malaria is the most important parasitic disease in the world with approximately two million people dying every year, mostly due to Plasmodium falciparum infection. During its complex life cycle in the Anopheles vector and human host, the parasite requires the coordinated and modulated expression of diverse sets of genes involved in epigenetic, transcriptional and post-transcriptional regulation. However, despite the availability of the complete sequence of the Plasmodium falciparum genome, we are still quite ignorant about Plasmodium mechanisms of transcriptional gene regulation. This is due to the poor prediction of nuclear proteins, cognate DNA motifs and structures involved in transcription. Results A comprehensive directory of proteins reported to be potentially involved in Plasmodium transcriptional machinery was built from all in silico reports and databanks. The transcription-associated proteins were clustered in three main sets of factors: general transcription factors, chromatin-related proteins (structuring, remodelling and histone modifying enzymes, and specific transcription factors. Only a few of these factors have been molecularly analysed. Furthermore, from transcriptome and proteome data we modelled expression patterns of transcripts and corresponding proteins during the intra-erythrocytic cycle. Finally, an interactome of these proteins based either on in silico or on 2-yeast-hybrid experimental approaches is discussed. Conclusion This is the first attempt to build a comprehensive directory of potential transcription-associated proteins in Plasmodium. In addition, all complete transcriptome, proteome and interactome raw data were re-analysed, compared and discussed for a better comprehension of the complex biological processes of Plasmodium falciparum transcriptional regulation during the erythrocytic development.

  4. Archaeal RNA polymerase arrests transcription at DNA lesions.

    Science.gov (United States)

    Gehring, Alexandra M; Santangelo, Thomas J

    2017-01-01

    Transcription elongation is not uniform and transcription is often hindered by protein-bound factors or DNA lesions that limit translocation and impair catalysis. Despite the high degree of sequence and structural homology of the multi-subunit RNA polymerases (RNAP), substantial differences in response to DNA lesions have been reported. Archaea encode only a single RNAP with striking structural conservation with eukaryotic RNAP II (Pol II). Here, we demonstrate that the archaeal RNAP from Thermococcus kodakarensis is sensitive to a variety of DNA lesions that pause and arrest RNAP at or adjacent to the site of DNA damage. DNA damage only halts elongation when present in the template strand, and the damage often results in RNAP arresting such that the lesion would be encapsulated with the transcription elongation complex. The strand-specific halt to archaeal transcription elongation on modified templates is supportive of RNAP recognizing DNA damage and potentially initiating DNA repair through a process akin to the well-described transcription-coupled DNA repair (TCR) pathways in Bacteria and Eukarya.

  5. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

    Science.gov (United States)

    Chen, Shi-Yi; Deng, Feilong; Jia, Xianbo; Li, Cao; Lai, Song-Jia

    2017-08-09

    It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.

  6. Uncovering layers of human RNA polymerase II transcription

    DEFF Research Database (Denmark)

    Jensen, Torben Heick

    In recent years DNA microarray and high-throughput sequencing technologies have challenged the “gene-centric” view that pre-mRNA is the only RNA species transcribed off protein-coding genes. Instead unorthodox transcription from within genic- and intergenic regions has been demonstrated to occur...

  7. SSH analysis of endosperm transcripts and characterization of heat stress regulated expressed sequence tags in bread wheat

    Directory of Open Access Journals (Sweden)

    Suneha Goswami

    2016-08-01

    Full Text Available Heat stress is one of the major problems in agriculturally important cereal crops, especially wheat. Here, we have constructed a subtracted cDNA library from the endosperm of HS-treated (42°C for 2 h wheat cv. HD2985 by suppression subtractive hybridization (SSH. We identified ~550 recombinant clones ranging from 200 to 500 bp with an average size of 300 bp. Sanger’s sequencing was performed with 205 positive clones to generate the differentially expressed sequence tags (ESTs. Most of the ESTs were observed to be localized on the long arm of chromosome 2A and associated with heat stress tolerance and metabolic pathways. Identified ESTs were BLAST search using Ensemble, TriFLD and TIGR databases and the predicted CDS were translated and aligned with the protein sequences available in pfam and InterProScan 5 databases to predict the differentially expressed proteins (DEPs. We observed eight different types of post-translational modifications (PTMs in the DEPs corresponds to the cloned ESTs—147 sites with phosphorylation, 21 sites with sumoylation, 237 with palmitoylation, 96 sites with S-nitrosylation, 3066 calpain cleavage sites, and 103 tyrosine nitration sites, predicted to sense the heat stress and regulate the expression of stress genes. Twelve DEPs were observed to have transmembrane helixes (TMH in their structure, predicted to play the role of sensors of HS. Quantitative Real-Time PCR of randomly selected ESTs showed very high relative expression of HSP17 under HS; up-regulation was observed more in wheat cv. HD2985 (thermotolerant, as compared to HD2329 (thermosusceptible during grain-filling. The abundance of transcripts was further validated through northern blot analysis. The ESTs and their corresponding DEPs can be used as molecular marker for screening or targeted precision breeding program. PTMs identified in the DEPs can be used to elucidate the thermotolerance mechanism of wheat – a novel step towards the development of

  8. Transcription factor binding sites prediction based on modified nucleosomes.

    Directory of Open Access Journals (Sweden)

    Mohammad Talebzadeh

    Full Text Available In computational methods, position weight matrices (PWMs are commonly applied for transcription factor binding site (TFBS prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, "modified nucleosomes neighboring" and "modified nucleosomes occupancy", to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method

  9. A DNA-binding-site landscape and regulatory network analysis for NAC transcription factors in Arabidopsis thaliana

    DEFF Research Database (Denmark)

    Lindemose, Søren; Jensen, Michael Krogh; de Velde, Jan Van

    2014-01-01

    regulatory networks of 12 NAC transcription factors. Our data offer specific single-base resolution fingerprints for most TFs studied and indicate that NAC DNA-binding specificities might be predicted from their DNA-binding domain's sequence. The developed methodology, including the application......Target gene identification for transcription factors is a prerequisite for the systems wide understanding of organismal behaviour. NAM-ATAF1/2-CUC2 (NAC) transcription factors are amongst the largest transcription factor families in plants, yet limited data exist from unbiased approaches to resolve...... the DNA-binding preferences of individual members. Here, we present a TF-target gene identification workflow based on the integration of novel protein binding microarray data with gene expression and multi-species promoter sequence conservation to identify the DNA-binding specificities and the gene...

  10. Interactions between the R2R3-MYB transcription factor, AtMYB61, and target DNA binding sites.

    Directory of Open Access Journals (Sweden)

    Michael B Prouse

    Full Text Available Despite the prominent roles played by R2R3-MYB transcription factors in the regulation of plant gene expression, little is known about the details of how these proteins interact with their DNA targets. For example, while Arabidopsis thaliana R2R3-MYB protein AtMYB61 is known to alter transcript abundance of a specific set of target genes, little is known about the specific DNA sequences to which AtMYB61 binds. To address this gap in knowledge, DNA sequences bound by AtMYB61 were identified using cyclic amplification and selection of targets (CASTing. The DNA targets identified using this approach corresponded to AC elements, sequences enriched in adenosine and cytosine nucleotides. The preferred target sequence that bound with the greatest affinity to AtMYB61 recombinant protein was ACCTAC, the AC-I element. Mutational analyses based on the AC-I element showed that ACC nucleotides in the AC-I element served as the core recognition motif, critical for AtMYB61 binding. Molecular modelling predicted interactions between AtMYB61 amino acid residues and corresponding nucleotides in the DNA targets. The affinity between AtMYB61 and specific target DNA sequences did not correlate with AtMYB61-driven transcriptional activation with each of the target sequences. CASTing-selected motifs were found in the regulatory regions of genes previously shown to be regulated by AtMYB61. Taken together, these findings are consistent with the hypothesis that AtMYB61 regulates transcription from specific cis-acting AC elements in vivo. The results shed light on the specifics of DNA binding by an important family of plant-specific transcriptional regulators.

  11. Coordinated Evolution of Transcriptional and Post-Transcriptional Regulation for Mitochondrial Functions in Yeast Strains.

    Directory of Open Access Journals (Sweden)

    Xuepeng Sun

    Full Text Available Evolution of gene regulation has been proposed to play an important role in environmental adaptation. Exploring mechanisms underlying coordinated evolutionary changes at various levels of gene regulation could shed new light on how organism adapt in nature. In this study, we focused on regulatory differences between a laboratory Saccharomyces cerevisiae strain BY4742 and a pathogenic S. cerevisiae strain, YJM789. The two strains diverge in many features, including growth rate, morphology, high temperature tolerance, and pathogenicity. Our RNA-Seq and ribosomal footprint profiling data showed that gene expression differences are pervasive, and genes functioning in mitochondria are mostly divergent between the two strains at both transcriptional and translational levels. Combining functional genomics data from other yeast strains, we further demonstrated that significant divergence of expression for genes functioning in the electron transport chain (ETC was likely caused by differential expression of a transcriptional factor, HAP4, and that post-transcriptional regulation mediated by an RNA-binding protein, PUF3, likely led to expression divergence for genes involved in mitochondrial translation. We also explored mito-nuclear interactions via mitochondrial DNA replacement between strains. Although the two mitochondrial genomes harbor substantial sequence divergence, neither growth nor gene expression were affected by mitochondrial DNA replacement in both fermentative and respiratory growth media, indicating compatible mitochondrial and nuclear genomes between these two strains in the tested conditions. Collectively, we used mitochondrial functions as an example to demonstrate for the first time that evolution at both transcriptional and post-transcriptional levels could lead to coordinated regulatory changes underlying strain specific functional variations.

  12. Generalizing and learning protein-DNA binding sequence representations by an evolutionary algorithm

    KAUST Repository

    Wong, Ka Chun

    2011-02-05

    Protein-DNA bindings are essential activities. Understanding them forms the basis for further deciphering of biological and genetic systems. In particular, the protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play a central role in gene transcription. Comprehensive TF-TFBS binding sequence pairs have been found in a recent study. However, they are in one-to-one mappings which cannot fully reflect the many-to-many mappings within the bindings. An evolutionary algorithm is proposed to learn generalized representations (many-to-many mappings) from the TF-TFBS binding sequence pairs (one-to-one mappings). The generalized pairs are shown to be more meaningful than the original TF-TFBS binding sequence pairs. Some representative examples have been analyzed in this study. In particular, it shows that the TF-TFBS binding sequence pairs are not presumably in one-to-one mappings. They can also exhibit many-to-many mappings. The proposed method can help us extract such many-to-many information from the one-to-one TF-TFBS binding sequence pairs found in the previous study, providing further knowledge in understanding the bindings between TFs and TFBSs. © 2011 Springer-Verlag.

  13. Generalizing and learning protein-DNA binding sequence representations by an evolutionary algorithm

    KAUST Repository

    Wong, Ka Chun; Peng, Chengbin; Wong, Manhon; Leung, Kwongsak

    2011-01-01

    Protein-DNA bindings are essential activities. Understanding them forms the basis for further deciphering of biological and genetic systems. In particular, the protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play a central role in gene transcription. Comprehensive TF-TFBS binding sequence pairs have been found in a recent study. However, they are in one-to-one mappings which cannot fully reflect the many-to-many mappings within the bindings. An evolutionary algorithm is proposed to learn generalized representations (many-to-many mappings) from the TF-TFBS binding sequence pairs (one-to-one mappings). The generalized pairs are shown to be more meaningful than the original TF-TFBS binding sequence pairs. Some representative examples have been analyzed in this study. In particular, it shows that the TF-TFBS binding sequence pairs are not presumably in one-to-one mappings. They can also exhibit many-to-many mappings. The proposed method can help us extract such many-to-many information from the one-to-one TF-TFBS binding sequence pairs found in the previous study, providing further knowledge in understanding the bindings between TFs and TFBSs. © 2011 Springer-Verlag.

  14. Mutation in an alternative transcript of CDKL5 in a boy with early-onset seizures.

    Science.gov (United States)

    Bodian, Dale L; Schreiber, John M; Vilboux, Thierry; Khromykh, Alina; Hauser, Natalie S

    2018-06-01

    Infantile-onset epilepsies are a set of severe, heterogeneous disorders for which clinical genetic testing yields causative mutations in ∼20%-50% of affected individuals. We report the case of a boy presenting with intractable seizures at 2 wk of age, for whom gene panel testing was unrevealing. Research-based whole-genome sequencing of the proband and four unaffected family members identified a de novo mutation, NM_001323289.1:c.2828_2829delGA in CDKL5, a gene associated with X-linked early infantile epileptic encephalopathy 2. CDKL5 has multiple alternative transcripts, and the mutation lies in an exon in the brain-expressed forms. The mutation was undetected by gene panel sequencing because of its intronic location in the CDKL5 transcript typically used to define the exons of this gene for clinical exon-based tests (NM_003159). This is the first report of a patient with a mutation in an alternative transcript of CDKL5 This finding suggests that incorporating alternative transcripts into the design and variant interpretation of exon-based tests, including gene panel and exome sequencing, could improve the diagnostic yield. © 2018 Bodian et al.; Published by Cold Spring Harbor Laboratory Press.

  15. Dynamic usage of transcription start sites within core promoters

    DEFF Research Database (Denmark)

    Kawaji, Hideya; Frith, Martin C; Katayama, Shintaro

    2006-01-01

    BACKGROUND: Mammalian promoters do not initiate transcription at single, well defined base pairs, but rather at multiple, alternative start sites spread across a region. We previously characterized the static structures of transcription start site usage within promoters at the base pair level......, based on large-scale sequencing of transcript 5' ends. RESULTS: In the present study we begin to explore the internal dynamics of mammalian promoters, and demonstrate that start site selection within many mouse core promoters varies among tissues. We also show that this dynamic usage of start sites...... is associated with CpG islands, broad and multimodal promoter structures, and imprinting. CONCLUSION: Our results reveal a new level of biologic complexity within promoters--fine-scale regulation of transcription starting events at the base pair level. These events are likely to be related to epigenetic...

  16. A glyphosate-based pesticide impinges on transcription

    International Nuclear Information System (INIS)

    Marc, Julie; Le Breton, Magali; Cormier, Patrick; Morales, Julia; Belle, Robert; Mulner-Lorillon, Odile

    2005-01-01

    Widely spread chemicals used for human benefits may exert adverse effects on health or the environment, the identification of which are a major challenge. The early development of the sea urchin constitutes an appropriate model for the identification of undesirable cellular and molecular targets of pollutants. The widespread glyphosate-based pesticide affected sea urchin development by impeding the hatching process at millimolar range concentration of glyphosate. Glyphosate, the active herbicide ingredient of Roundup, by itself delayed hatching as judged from the comparable effect of different commercial glyphosate-based pesticides and from the effect of pure glyphosate addition to a threshold concentration of Roundup. The surfactant polyoxyethylene amine (POEA), the major component of commercial Roundup, was found to be highly toxic to the embryos when tested alone and therefore could contribute to the inhibition of hatching. Hatching, a landmark of early development, is a transcription-dependent process. Correlatively, the herbicide inhibited the global transcription, which follows fertilization at the 16-cell stage. Transcription inhibition was dose-dependent in the millimolar glyphosate range concentration. A 1257-bp fragment of the hatching enzyme transcript from Sphaerechinus granularis was cloned and sequenced; its transcription was delayed by 2 h in the pesticide-treated embryos. Because transcription is a fundamental basic biological process, the pesticide may be of health concern by inhalation near herbicide spraying at a concentration 25 times the adverse transcription concentration in the sprayed microdroplets

  17. Pairwise comparisons of ten porcine tissues identify differential transcriptional regulation at the gene, isoform, promoter and transcription start site level

    International Nuclear Information System (INIS)

    Farajzadeh, Leila; Hornshøj, Henrik; Momeni, Jamal; Thomsen, Bo; Larsen, Knud; Hedegaard, Jakob; Bendixen, Christian; Madsen, Lone Bruhn

    2013-01-01

    Highlights: •Transcriptome sequencing yielded 223 mill porcine RNA-seq reads, and 59,000 transcribed locations. •Establishment of unique transcription profiles for ten porcine tissues including four brain tissues. •Comparison of transcription profiles at gene, isoform, promoter and transcription start site level. •Highlights a high level of regulation of neuro-related genes at both gene, isoform, and TSS level. •Our results emphasize the pig as a valuable animal model with respect to human biological issues. -- Abstract: The transcriptome is the absolute set of transcripts in a tissue or cell at the time of sampling. In this study RNA-Seq is employed to enable the differential analysis of the transcriptome profile for ten porcine tissues in order to evaluate differences between the tissues at the gene and isoform expression level, together with an analysis of variation in transcription start sites, promoter usage, and splicing. Totally, 223 million RNA fragments were sequenced leading to the identification of 59,930 transcribed gene locations and 290,936 transcript variants using Cufflinks with similarity to approximately 13,899 annotated human genes. Pairwise analysis of tissues for differential expression at the gene level showed that the smallest differences were between tissues originating from the porcine brain. Interestingly, the relative level of differential expression at the isoform level did generally not vary between tissue contrasts. Furthermore, analysis of differential promoter usage between tissues, revealed a proportionally higher variation between cerebellum (CBE) versus frontal cortex and cerebellum versus hypothalamus (HYP) than in the remaining comparisons. In addition, the comparison of differential transcription start sites showed that the number of these sites is generally increased in comparisons including hypothalamus in contrast to other pairwise assessments. A comprehensive analysis of one of the tissue contrasts, i

  18. Salmon louse (Lepeophtheirus salmonis transcriptomes during post molting maturation and egg production, revealed using EST-sequencing and microarray analysis

    Directory of Open Access Journals (Sweden)

    Jonassen Inge

    2008-03-01

    Full Text Available Abstract Background Lepeophtheirus salmonis is an ectoparasitic copepod feeding on skin, mucus and blood from salmonid hosts. Initial analysis of EST sequences from pre adult and adult stages of L. salmonis revealed a large proportion of novel transcripts. In order to link unknown transcripts to biological functions we have combined EST sequencing and microarray analysis to characterize female salmon louse transcriptomes during post molting maturation and egg production. Results EST sequence analysis shows that 43% of the ESTs have no significant hits in GenBank. Sequenced ESTs assembled into 556 contigs and 1614 singletons and whenever homologous genes were identified no clear correlation with homologous genes from any specific animal group was evident. Sequence comparison of 27 L. salmonis proteins with homologous proteins in humans, zebrafish, insects and crustaceans revealed an almost identical sequence identity with all species. Microarray analysis of maturing female adult salmon lice revealed two major transcription patterns; up-regulation during the final molting followed by down regulation and female specific up regulation during post molting growth and egg production. For a third minor group of ESTs transcription decreased during molting from pre-adult II to immature adults. Genes regulated during molting typically gave hits with cuticula proteins whilst transcripts up regulated during post molting growth were female specific, including two vitellogenins. Conclusion The copepod L.salmonis contains high a level of novel genes. Among analyzed L.salmonis proteins, sequence identities with homologous proteins in crustaceans are no higher than to homologous proteins in humans. Three distinct processes, molting, post molting growth and egg production correlate with transcriptional regulation of three groups of transcripts; two including genes related to growth, one including genes related to egg production. The function of the regulated

  19. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    Science.gov (United States)

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  20. Using RNA-seq to determine the transcriptional landscape and the hypoxic response of the pathogenic yeast Candida parapsilosis

    LENUS (Irish Health Repository)

    Guida, Alessandro

    2011-12-22

    Abstract Background Candida parapsilosis is one of the most common causes of Candida infection worldwide. However, the genome sequence annotation was made without experimental validation and little is known about the transcriptional landscape. The transcriptional response of C. parapsilosis to hypoxic (low oxygen) conditions, such as those encountered in the host, is also relatively unexplored. Results We used next generation sequencing (RNA-seq) to determine the transcriptional profile of C. parapsilosis growing in several conditions including different media, temperatures and oxygen concentrations. We identified 395 novel protein-coding sequences that had not previously been annotated. We removed > 300 unsupported gene models, and corrected approximately 900. We mapped the 5\\' and 3\\' UTR for thousands of genes. We also identified 422 introns, including two introns in the 3\\' UTR of one gene. This is the first report of 3\\' UTR introns in the Saccharomycotina. Comparing the introns in coding sequences with other species shows that small numbers have been gained and lost throughout evolution. Our analysis also identified a number of novel transcriptional active regions (nTARs). We used both RNA-seq and microarray analysis to determine the transcriptional profile of cells grown in normoxic and hypoxic conditions in rich media, and we showed that there was a high correlation between the approaches. We also generated a knockout of the UPC2 transcriptional regulator, and we found that similar to C. albicans, Upc2 is required for conferring resistance to azole drugs, and for regulation of expression of the ergosterol pathway in hypoxia. Conclusion We provide the first detailed annotation of the C. parapsilosis genome, based on gene predictions and transcriptional analysis. We identified a number of novel ORFs and other transcribed regions, and detected transcripts from approximately 90% of the annotated protein coding genes. We found that the transcription factor

  1. Unveiling Mycoplasma hyopneumoniae Promoters: Sequence Definition and Genomic Distribution

    Science.gov (United States)

    Weber, Shana de Souto; Sant'Anna, Fernando Hayashi; Schrank, Irene Silveira

    2012-01-01

    Several Mycoplasma species have had their genome completely sequenced, including four strains of the swine pathogen Mycoplasma hyopneumoniae. Nevertheless, little is known about the nucleotide sequences that control transcriptional initiation in these microorganisms. Therefore, with the objective of investigating the promoter sequences of M. hyopneumoniae, 23 transcriptional start sites (TSSs) of distinct genes were mapped. A pattern that resembles the σ70 promoter −10 element was found upstream of the TSSs. However, no −35 element was distinguished. Instead, an AT-rich periodic signal was identified. About half of the experimentally defined promoters contained the motif 5′-TRTGn-3′, which was identical to the −16 element usually found in Gram-positive bacteria. The defined promoters were utilized to build position-specific scoring matrices in order to scan putative promoters upstream of all coding sequences (CDSs) in the M. hyopneumoniae genome. Two hundred and one signals were found associated with 169 CDSs. Most of these sequences were located within 100 nucleotides of the start codons. This study has shown that the number of promoter-like sequences in the M. hyopneumoniae genome is more frequent than expected by chance, indicating that most of the sequences detected are probably biologically functional. PMID:22334569

  2. Sequencing and De Novo Transcriptome Assembly of Brachypodium sylvaticum (Poaceae

    Directory of Open Access Journals (Sweden)

    Samuel E. Fox

    2013-03-01

    Full Text Available Premise of the study: We report the de novo assembly and characterization of the transcriptomes of Brachypodium sylvaticum (slender false-brome accessions from native populations of Spain and Greece, and an invasive population west of Corvallis, Oregon, USA. Methods and Results: More than 350 million sequence reads from the mRNA libraries prepared from three B. sylvaticum genotypes were assembled into 120,091 (Corvallis, 104,950 (Spain, and 177,682 (Greece transcript contigs. In comparison with the B. distachyon Bd21 reference genome and GenBank protein sequences, we estimate >90% exome coverage for B. sylvaticum. The transcripts were assigned Gene Ontology and InterPro annotations. Brachypodium sylvaticum sequence reads aligned against the Bd21 genome revealed 394,654 single-nucleotide polymorphisms (SNPs and >20,000 simple sequence repeat (SSR DNA sites. Conclusions: To our knowledge, this is the first report of transcriptome sequencing of invasive plant species with a closely related sequenced reference genome. The sequences and identified SNP variant and SSR sites will provide tools for developing novel genetic markers for use in genotyping and characterization of invasive behavior of B. sylvaticum.

  3. Laccase Gene Family in Cerrena sp. HYB07: Sequences, Heterologous Expression and Transcriptional Analysis

    Directory of Open Access Journals (Sweden)

    Jie Yang

    2016-08-01

    Full Text Available Laccases are a class of multi-copper oxidases with industrial potential. In this study, eight laccases (Lac1–8 from Cerrena sp. strain HYB07, a white-rot fungus with high laccase yields, were analyzed. The laccases showed moderate identities to each other as well as with other fungal laccases and were predicted to have high redox potentials except for Lac6. Selected laccase isozymes were heterologously expressed in the yeast Pichia pastoris, and different enzymatic properties were observed. Transcription of the eight laccase genes was differentially regulated during submerged and solid state fermentation, as shown by quantitative real-time polymerase chain reaction and validated reference genes. During 6-day submerged fermentation, Lac7 and 2 were successively the predominantly expressed laccase gene, accounting for over 95% of all laccase transcripts. Interestingly, accompanying Lac7 downregulation, Lac2 transcription was drastically upregulated on days 3 and 5 to 9958-fold of the level on day 1. Consistent with high mRNA abundance, Lac2 and 7, but not other laccases, were identified in the fermentation broth by LC-MS/MS. In solid state fermentation, less dramatic differences in transcript abundance were observed, and Lac3, 7 and 8 were more highly expressed than other laccase genes. Elucidating the properties and expression profiles of the laccase gene family will facilitate understanding, production and commercialization of the fungal strain and its laccases.

  4. Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress

    Science.gov (United States)

    2013-01-01

    Background Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Results Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Conclusions Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars. PMID:24074255

  5. The cellular transcription factor CREB corresponds to activating transcription factor 47 (ATF-47) and forms complexes with a group of polypeptides related to ATF-43.

    OpenAIRE

    Hurst, H C; Masson, N; Jones, N C; Lee, K A

    1990-01-01

    Promoter elements containing the sequence motif CGTCA are important for a variety of inducible responses at the transcriptional level. Multiple cellular factors specifically bind to these elements and are encoded by a multigene family. Among these factors, polypeptides termed activating transcription factor 43 (ATF-43) and ATF-47 have been purified from HeLa cells and a factor referred to as cyclic AMP response element-binding protein (CREB) has been isolated from PC12 cells and rat brain. We...

  6. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2007-02-01

    Full Text Available Abstract Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

  7. Novel transcripts discovered by mining genomic DNA from defined regions of bovine chromosome 6

    Directory of Open Access Journals (Sweden)

    Eberlein Annett

    2009-04-01

    Full Text Available Abstract Background Linkage analyses strongly suggest a number of QTL for production, health and conformation traits in the middle part of bovine chromosome 6 (BTA6. The identification of the molecular background underlying the genetic variation at the QTL and subsequent functional studies require a well-annotated gene sequence map of the critical QTL intervals. To complete the sequence map of the defined subchromosomal regions on BTA6 poorly covered with comparative gene information, we focused on targeted isolation of transcribed sequences from bovine bacterial artificial chromosome (BAC clones mapped to the QTL intervals. Results Using the method of exon trapping, 92 unique exon trapping sequences (ETS were discovered in a chromosomal region of poor gene coverage. Sequence identity to the current NCBI sequence assembly for BTA6 was detected for 91% of unique ETS. Comparative sequence similarity search revealed that 11% of the isolated ETS displayed high similarity to genomic sequences located on the syntenic chromosomes of the human and mouse reference genome assemblies. Nearly a third of the ETS identified similar equivalent sequences in genomic sequence scaffolds from the alternative Celera-based sequence assembly of the human genome. Screening gene, EST, and protein databases detected 17% of ETS with identity to known transcribed sequences. Expression analysis of a subset of the ETS showed that most ETS (84% displayed a distinctive expression pattern in a multi-tissue panel of a lactating cow verifying their existence in the bovine transcriptome. Conclusion The results of our study demonstrate that the exon trapping method based on region-specific BAC clones is very useful for targeted screening for novel transcripts located within a defined chromosomal region being deficiently endowed with annotated gene information. The majority of identified ETS represents unknown noncoding sequences in intergenic regions on BTA6 displaying a

  8. The WRKY transcription factor family in Brachypodium distachyon.

    Science.gov (United States)

    Tripathi, Prateek; Rabara, Roel C; Langum, Tanner J; Boken, Ashley K; Rushton, Deena L; Boomsma, Darius D; Rinerson, Charles I; Rabara, Jennifer; Reese, R Neil; Chen, Xianfeng; Rohila, Jai S; Rushton, Paul J

    2012-06-22

    A complete assembled genome sequence of wheat is not yet available. Therefore, model plant systems for wheat are very valuable. Brachypodium distachyon (Brachypodium) is such a system. The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating important agronomic traits. Studies of WRKY transcription factors in Brachypodium and wheat therefore promise to lead to new strategies for wheat improvement. We have identified and manually curated the WRKY transcription factor family from Brachypodium using a pipeline designed to identify all potential WRKY genes. 86 WRKY transcription factors were found, a total higher than all other current databases. We therefore propose that our numbering system (BdWRKY1-BdWRKY86) becomes the standard nomenclature. In the JGI v1.0 assembly of Brachypodium with the MIPS/JGI v1.0 annotation, nine of the transcription factors have no gene model and eleven gene models are probably incorrectly predicted. In total, twenty WRKY transcription factors (23.3%) do not appear to have accurate gene models. To facilitate use of our data, we have produced The Database of Brachypodium distachyon WRKY Transcription Factors. Each WRKY transcription factor has a gene page that includes predicted protein domains from MEME analyses. These conserved protein domains reflect possible input and output domains in signaling. The database also contains a BLAST search function where a large dataset of WRKY transcription factors, published genes, and an extensive set of wheat ESTs can be searched. We also produced a phylogram containing the WRKY transcription factor families from Brachypodium, rice, Arabidopsis, soybean, and Physcomitrella patens, together with published WRKY transcription factors from wheat. This phylogenetic tree provides evidence for orthologues, co-orthologues, and paralogues of Brachypodium WRKY transcription factors. The description of the WRKY transcription factor

  9. Functional characterization of tobacco transcription factor TGA2.1

    DEFF Research Database (Denmark)

    Kegler, C.; Lenk, I.; Krawczyk, S.

    2004-01-01

    Activation sequence-1 (as-1)-like regulatory cis elements mediate transcriptional activation in response to increased levels of plant signalling molecules auxin and salicylic acid (SA). Our earlier work has shown that tobacco cellular as-1-binding complex SARP (salicylic acid responsive protein...

  10. Characterization of herpes simplex virus 2 primary microRNA Transcript regulation.

    Science.gov (United States)

    Tang, Shuang; Bosch-Marce, Marta; Patel, Amita; Margolis, Todd P; Krause, Philip R

    2015-05-01

    In order to understand factors that may influence latency-associated transcription and latency-associated transcript (LAT) phenotypes, we studied the expression of the herpes simplex virus 2 (HSV-2) LAT-associated microRNAs (miRNAs). We mapped the transcription initiation sites of all three primary miRNA transcripts and identified the ICP4-binding sequences at the transcription initiation sites of both HSV-2 LAT (pri-miRNA for miR-I and miR-II, which target ICP34.5, and miR-III, which targets ICP0) and L/ST (a pri-miRNA for miR-I and miR-II) but not at that of the primary miR-H6 (for which the target is unknown). We confirmed activity of the putative HSV-2 L/ST promoter and found that ICP4 trans-activates the L/ST promoter when the ICP4-binding site at its transcription initiation site is mutated, suggesting that ICP4 may play a dual role in regulating transcription of L/ST and, consequently, of miR-I and miR-II. LAT exon 1 (containing LAT enhancer sequences), together with the LAT promoter region, comprises a bidirectional promoter required for the expression of both LAT-encoded miRNAs and miR-H6 in latently infected mouse ganglia. The ability of ICP4 to suppress ICP34.5-targeting miRNAs and to activate lytic viral genes suggests that ICP4 could play a key role in the switch between latency and reactivation. The HSV-2 LAT and viral miRNAs expressed in the LAT region are the most abundant viral transcripts during HSV latency. The balance between the expression of LAT and LAT-associated miRNAs and the expression of lytic viral transcripts from the opposite strand appears to influence whether individual HSV-infected neurons will be latently or productively infected. The outcome of neuronal infection may thus depend on regulation of gene expression of the corresponding primary miRNAs. In the present study, we characterize promoter sequences responsible for miRNA expression, including identification of the primary miRNA 5' ends and evaluation of ICP4 response. These

  11. Multiple 5' ends of human cytomegalovirus UL57 transcripts identify a complex, cycloheximide-resistant promoter region that activates oriLyt

    International Nuclear Information System (INIS)

    Kiehl, Anita; Huang, Lili; Franchi, David; Anders, David G.

    2003-01-01

    The human cytomegalovirus (HCMV) UL57 gene lies adjacent to HCMV oriLyt, from which it is separated by an organizationally conserved, mostly noncoding region that is thought to both regulate UL57 expression and activate oriLyt function. However, the UL57 promoter has not been studied. We determined the 5' ends of UL57 transcripts toward an understanding of the potential relationship between UL57 expression and oriLyt activation. The results presented here identified three distinct 5' ends spread over 800 bp, at nt 90302, 90530, and 91138; use of these sites exhibited differential sensitivity to phosphonoformic acid treatment. Interestingly, a 10-kb UL57 transcript accumulated in cycloheximide-treated infected cells, even though other early transcripts were not detectable. However, the 10-kb transcript did not accumulate in cells treated with the more stringent translation inhibitor anisomycin. Consistent with the notion that the identified 5' ends arise from distinct transcription start sites, the sequences upstream of sites I and II functioned as promoters responsive to HCMV infection in transient assays. However, the origin-proximal promoter region III required downstream sequences for transcriptional activity. Mutation of candidate core promoter elements suggested that promoter III is regulated by an initiator region (Inr) and a downstream promoter element. Finally, a 42-bp sequence containing the candidate Inr activated a minimal oriLyt core construct in transient replication assays. Thus, these studies showed that a large, complex promoter region with novel features controls UL57 expression, and identified a sequence that regulates both UL57 transcription and oriLyt activation

  12. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Science.gov (United States)

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  13. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Directory of Open Access Journals (Sweden)

    Zing Tsung-Yeh Tsai

    2015-08-01

    Full Text Available Transcription factor (TF binding is determined by the presence of specific sequence motifs (SM and chromatin accessibility, where the latter is influenced by both chromatin state (CS and DNA structure (DS properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  14. Transcription factor trapping by RNA in gene regulatory elements.

    Science.gov (United States)

    Sigova, Alla A; Abraham, Brian J; Ji, Xiong; Molinie, Benoit; Hannett, Nancy M; Guo, Yang Eric; Jangi, Mohini; Giallourakis, Cosmas C; Sharp, Phillip A; Young, Richard A

    2015-11-20

    Transcription factors (TFs) bind specific sequences in promoter-proximal and -distal DNA elements to regulate gene transcription. RNA is transcribed from both of these DNA elements, and some DNA binding TFs bind RNA. Hence, RNA transcribed from regulatory elements may contribute to stable TF occupancy at these sites. We show that the ubiquitously expressed TF Yin-Yang 1 (YY1) binds to both gene regulatory elements and their associated RNA species across the entire genome. Reduced transcription of regulatory elements diminishes YY1 occupancy, whereas artificial tethering of RNA enhances YY1 occupancy at these elements. We propose that RNA makes a modest but important contribution to the maintenance of certain TFs at gene regulatory elements and suggest that transcription of regulatory elements produces a positive-feedback loop that contributes to the stability of gene expression programs. Copyright © 2015, American Association for the Advancement of Science.

  15. A transcription factor active on the epidermal growth factor receptor gene

    International Nuclear Information System (INIS)

    Kageyama, R.; Merlino, G.T.; Pastan, I.

    1988-01-01

    The authors have developed an in vitro transcription system for the epidermal growth factor receptor (EGFR) oncogene by using nuclear extracts of A431 human epidermoid carcinoma cells, which overproduce EGFR. They found that a nuclear factor, termed EGFR-specific transcription factor (ETF), specifically stimulated EGFR transcription by 5- to 10-fold. In this report, ETF, purified by using sequence-specific oligonucleotide affinity chromatography, is shown by renaturing material eluted from a NaDodSO 4 /polyacrylamide gel to be a protein with a molecular mass of 120 kDa. ETF binds to the promoter region, as measured by DNase I footprinting and gel-mobility-shift assays, and specifically stimulates the transcription of the EGFR gene in a reconstituted in vitro transcription system. These results suggest that ETF could play a role in the overexpression of the cellular oncogene EGFR

  16. Transcription of Gypsy Elements in a Y-Chromosome Male Fertility Gene of Drosophila Hydei

    Science.gov (United States)

    Hochstenbach, R.; Harhangi, H.; Schouren, K.; Bindels, P.; Suijkerbuijk, R.; Hennig, W.

    1996-01-01

    We have found that defective gypsy retrotransposons are a major constituent of the lampbrush loop pair Nooses in the short arm of the Y chromosome of Drosophila hydei. The loop pair is formed by male fertility gene Q during the primary spermatocyte stage of spermatogenesis, each loop being a single transcription unit with an estimated length of 260 kb. Using fluorescent in situ hybridization, we show that throughout the loop transcripts gypsy elements are interspersed with blocks of a tandemly repetitive Y-specific DNA sequence, ay1. Nooses transcripts containing both sequence types show a wide size range on Northern blots, do not migrate to the cytoplasm, and are degraded just before the first meiotic division. Only one strand of ay1 and only the coding strand of gypsy can be detected in the loop transcripts. However, as cloned genomic DNA fragments also display opposite orientations of ay1 and gypsy, such DNA sections cannot be part of the Nooses. Hence, they are most likely derived from the flanking heterochromatin. The direction of transcription of ay1 and gypsy thus appears to be of a functional significance. PMID:8852843

  17. Novel fusion genes and chimeric transcripts in ependymal tumors

    DEFF Research Database (Denmark)

    Olsen, Thale Kristin; Panagopoulos, Ioannis; Gorunova, Ludmila

    2016-01-01

    with subsequent Sanger sequencing was used to validate the potential fusions. Fluorescent in situ hybridization (FISH) using locus-specific probes was also performed. A total of 841 candidate chimeric transcripts were identified in the 12 tumors, with an average of 49 unique candidate fusions per tumor. After...... infratentorial anaplastic ependymoma. Our previously reported ALK rearrangements and the RELA and YAP1 fusions found in supratentorial ependymomas were until now the only known fusion genes present in ependymal tumors. The chimeric transcripts presented here are the first to be reported in infratentorial...

  18. Genomic and chromatin signals underlying transcription start-site selection

    DEFF Research Database (Denmark)

    Valen, Eivind; Sandelin, Albin Gustav

    2011-01-01

    A central question in cellular biology is how the cell regulates transcription and discerns when and where to initiate it. Locating transcription start sites (TSSs), the signals that specify them, and ultimately elucidating the mechanisms of regulated initiation has therefore been a recurrent theme....... In recent years substantial progress has been made towards this goal, spurred by the possibility of applying genome-wide, sequencing-based analysis. We now have a large collection of high-resolution datasets identifying locations of TSSs, protein-DNA interactions, and chromatin features over whole genomes...

  19. Enhancing yeast transcription analysis through integration of heterogeneous data

    DEFF Research Database (Denmark)

    Grotkjær, Thomas; Nielsen, Jens

    2004-01-01

    of Saccharomyces cerevisiae whole genome transcription data. A special focus is on the quantitative aspects of normalisation and mathematical modelling approaches, since they are expected to play an increasing role in future DNA microarray analysis studies. Data analysis is exemplified with cluster analysis......DNA microarray technology enables the simultaneous measurement of the transcript level of thousands of genes. Primary analysis can be done with basic statistical tools and cluster analysis, but effective and in depth analysis of the vast amount of transcription data requires integration with data...... from several heterogeneous data Sources, such as upstream promoter sequences, genome-scale metabolic models, annotation databases and other experimental data. In this review, we discuss how experimental design, normalisation, heterogeneous data and mathematical modelling can enhance analysis...

  20. Iterative reconstruction of transcriptional regulatory networks: an algorithmic approach.

    Directory of Open Access Journals (Sweden)

    Christian L Barrett

    2006-05-01

    Full Text Available The number of complete, publicly available genome sequences is now greater than 200, and this number is expected to rapidly grow in the near future as metagenomic and environmental sequencing efforts escalate and the cost of sequencing drops. In order to make use of this data for understanding particular organisms and for discerning general principles about how organisms function, it will be necessary to reconstruct their various biochemical reaction networks. Principal among these will be transcriptional regulatory networks. Given the physical and logical complexity of these networks, the various sources of (often noisy data that can be utilized for their elucidation, the monetary costs involved, and the huge number of potential experiments approximately 10(12 that can be performed, experiment design algorithms will be necessary for synthesizing the various computational and experimental data to maximize the efficiency of regulatory network reconstruction. This paper presents an algorithm for experimental design to systematically and efficiently reconstruct transcriptional regulatory networks. It is meant to be applied iteratively in conjunction with an experimental laboratory component. The algorithm is presented here in the context of reconstructing transcriptional regulation for metabolism in Escherichia coli, and, through a retrospective analysis with previously performed experiments, we show that the produced experiment designs conform to how a human would design experiments. The algorithm is able to utilize probability estimates based on a wide range of computational and experimental sources to suggest experiments with the highest potential of discovering the greatest amount of new regulatory knowledge.

  1. Transcript-level annotation of Affymetrix probesets improves the interpretation of gene expression data

    Directory of Open Access Journals (Sweden)

    Tu Kang

    2007-06-01

    Full Text Available Abstract Background The wide use of Affymetrix microarray in broadened fields of biological research has made the probeset annotation an important issue. Standard Affymetrix probeset annotation is at gene level, i.e. a probeset is precisely linked to a gene, and probeset intensity is interpreted as gene expression. The increased knowledge that one gene may have multiple transcript variants clearly brings up the necessity of updating this gene-level annotation to a refined transcript-level. Results Through performing rigorous alignments of the Affymetrix probe sequences against a comprehensive pool of currently available transcript sequences, and further linking the probesets to the International Protein Index, we generated transcript-level or protein-level annotation tables for two popular Affymetrix expression arrays, Mouse Genome 430A 2.0 Array and Human Genome U133A Array. Application of our new annotations in re-examining existing expression data sets shows increased expression consistency among synonymous probesets and strengthened expression correlation between interacting proteins. Conclusion By refining the standard Affymetrix annotation of microarray probesets from the gene level to the transcript level and protein level, one can achieve a more reliable interpretation of their experimental data, which may lead to discovery of more profound regulatory mechanism.

  2. Whi7 is an unstable cell-cycle repressor of the Start transcriptional program.

    Science.gov (United States)

    Gomar-Alba, Mercè; Méndez, Ester; Quilis, Inma; Bañó, M Carmen; Igual, J Carlos

    2017-08-24

    Start is the main decision point in eukaryotic cell cycle in which cells commit to a new round of cell division. It involves the irreversible activation of a transcriptional program by G1 CDK-cyclin complexes through the inactivation of Start transcriptional repressors, Whi5 in yeast or Rb in mammals. Here we provide novel keys of how Whi7, a protein related at sequence level to Whi5, represses Start. Whi7 is an unstable protein, degraded by the SCF Grr1 ubiquitin-ligase, whose stability is cell cycle regulated by CDK1 phosphorylation. Importantly, Whi7 associates to G1/S gene promoters in late G1 acting as a repressor of SBF-dependent transcription. Our results demonstrate that Whi7 is a genuine paralog of Whi5. In fact, both proteins collaborate in Start repression bringing to light that yeast cells, as occurs in mammalian cells, rely on the combined action of multiple transcriptional repressors to block Start transition.The commitment of cells to a new cycle of division involves inactivation of the Start transcriptional repressor Whi5. Here the authors show that the sequence related protein Whi7 associates to G1/S gene promoters in late G1 and collaborates with Whi5 in Start repression.

  3. The transcription fidelity factor GreA impedes DNA break repair.

    Science.gov (United States)

    Sivaramakrishnan, Priya; Sepúlveda, Leonardo A; Halliday, Jennifer A; Liu, Jingjing; Núñez, María Angélica Bravo; Golding, Ido; Rosenberg, Susan M; Herman, Christophe

    2017-10-12

    Homologous recombination repairs DNA double-strand breaks and must function even on actively transcribed DNA. Because break repair prevents chromosome loss, the completion of repair is expected to outweigh the transcription of broken templates. However, the interplay between DNA break repair and transcription processivity is unclear. Here we show that the transcription factor GreA inhibits break repair in Escherichia coli. GreA restarts backtracked RNA polymerase and hence promotes transcription fidelity. We report that removal of GreA results in markedly enhanced break repair via the classic RecBCD-RecA pathway. Using a deep-sequencing method to measure chromosomal exonucleolytic degradation, we demonstrate that the absence of GreA limits RecBCD-mediated resection. Our findings suggest that increased RNA polymerase backtracking promotes break repair by instigating RecA loading by RecBCD, without the influence of canonical Chi signals. The idea that backtracked RNA polymerase can stimulate recombination presents a DNA transaction conundrum: a transcription fidelity factor that compromises genomic integrity.

  4. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  5. Transcriptional networks in epithelial-mesenchymal transition.

    Directory of Open Access Journals (Sweden)

    Christo Venkov

    Full Text Available Epithelial-mesenchymal transition (EMT changes polarized epithelial cells into migratory phenotypes associated with loss of cell-cell adhesion molecules and cytoskeletal rearrangements. This form of plasticity is seen in mesodermal development, fibroblast formation, and cancer metastasis.Here we identify prominent transcriptional networks active during three time points of this transitional process, as epithelial cells become fibroblasts. DNA microarray in cultured epithelia undergoing EMT, validated in vivo, were used to detect various patterns of gene expression. In particular, the promoter sequences of differentially expressed genes and their transcription factors were analyzed to identify potential binding sites and partners. The four most frequent cis-regulatory elements (CREs in up-regulated genes were SRY, FTS-1, Evi-1, and GC-Box, and RNA inhibition of the four transcription factors, Atf2, Klf10, Sox11, and SP1, most frequently binding these CREs, establish their importance in the initiation and propagation of EMT. Oligonucleotides that block the most frequent CREs restrain EMT at early and intermediate stages through apoptosis of the cells.Our results identify new transcriptional interactions with high frequency CREs that modulate the stability of cellular plasticity, and may serve as targets for modulating these transitional states in fibroblasts.

  6. Molecular analysis of alternative transcripts of equine AXL receptor tyrosine kinase gene.

    Science.gov (United States)

    Park, Jeong-Woong; Song, Ki-Duk; Kim, Nam Young; Choi, Jae-Young; Hong, Seul A; Oh, Jin Hyeog; Kim, Si Won; Lee, Jeong Hyo; Park, Tae Sub; Kim, Jin-Kyoo; Kim, Jong Geun; Cho, Byung-Wook

    2017-10-01

    Since athletic performance is a most importance trait in horses, most research focused on physiological and physical studies of horse athletic abilities. In contrast, the molecular analysis as well as the regulatory pathway studies remain insufficient for evaluation and prediction of horse athletic abilities. In our previous study, we identified AXL receptor tyrosine kinase ( AXL ) gene which was expressed as alternative spliced isoforms in skeletal muscle during exercise. In the present study, we validated two AXL alternative splicing transcripts (named as AXLa for long form and AXLb for short form) in equine skeletal muscle to gain insight(s) into the role of each alternative transcript during exercise. We validated two isoforms of AXL transcripts in horse tissues by reverse transcriptase polymerase chain reaction (RT-PCR), and then cloned the transcripts to confirm the alternative locus and its sequences. Additionally, we examined the expression patterns of AXLa and AXLb transcripts in horse tissues by quantitative RT-PCR (qRT-PCR). Both of AXLa and AXLb transcripts were expressed in horse skeletal muscle and the expression levels were significantly increased after exercise. The sequencing analysis showed that there was an alternative splicing event at exon 11 between AXLa and AXLb transcripts. 3-dimentional (3D) prediction of the alternative protein structures revealed that the structural distance of the connective region between fibronectin type 3 (FN3) and immunoglobin (Ig) domain was different between two alternative isoforms. It is assumed that the expression patterns of AXLa and AXLb transcripts would be involved in regulation of exercise-induced stress in horse muscle possibly through an NF-κB signaling pathway. Further study is necessary to uncover biological function(s) and significance of the alternative splicing isoforms in race horse skeletal muscle.

  7. Molecular analysis of alternative transcripts of equine AXL receptor tyrosine kinase gene

    Directory of Open Access Journals (Sweden)

    Jeong-Woong Park

    2017-10-01

    Full Text Available Objective Since athletic performance is a most importance trait in horses, most research focused on physiological and physical studies of horse athletic abilities. In contrast, the molecular analysis as well as the regulatory pathway studies remain insufficient for evaluation and prediction of horse athletic abilities. In our previous study, we identified AXL receptor tyrosine kinase (AXL gene which was expressed as alternative spliced isoforms in skeletal muscle during exercise. In the present study, we validated two AXL alternative splicing transcripts (named as AXLa for long form and AXLb for short form in equine skeletal muscle to gain insight(s into the role of each alternative transcript during exercise. Methods We validated two isoforms of AXL transcripts in horse tissues by reverse transcriptase polymerase chain reaction (RT-PCR, and then cloned the transcripts to confirm the alternative locus and its sequences. Additionally, we examined the expression patterns of AXLa and AXLb transcripts in horse tissues by quantitative RT-PCR (qRT-PCR. Results Both of AXLa and AXLb transcripts were expressed in horse skeletal muscle and the expression levels were significantly increased after exercise. The sequencing analysis showed that there was an alternative splicing event at exon 11 between AXLa and AXLb transcripts. 3-dimentional (3D prediction of the alternative protein structures revealed that the structural distance of the connective region between fibronectin type 3 (FN3 and immunoglobin (Ig domain was different between two alternative isoforms. Conclusion It is assumed that the expression patterns of AXLa and AXLb transcripts would be involved in regulation of exercise-induced stress in horse muscle possibly through an NF-κB signaling pathway. Further study is necessary to uncover biological function(s and significance of the alternative splicing isoforms in race horse skeletal muscle.

  8. Transcriptional regulator-mediated activation of adaptation genes triggers CRISPR de novo spacer acquisition

    DEFF Research Database (Denmark)

    Liu, Tao; Li, Yingjun; Wang, Xiaodi

    2015-01-01

    Acquisition of de novo spacer sequences confers CRISPR-Cas with a memory to defend against invading genetic elements. However, the mechanism of regulation of CRISPR spacer acquisition remains unknown. Here we examine the transcriptional regulation of the conserved spacer acquisition genes in Type I......, it was demonstrated that the transcription level of csa1, cas1, cas2 and cas4 was significantly enhanced in a csa3a-overexpression strain and, moreover, the Csa1 and Cas1 protein levels were increased in this strain. Furthermore, we demonstrated the hyperactive uptake of unique spacers within both CRISPR loci...... in the presence of the csa3a overexpression vector. The spacer acquisition process is dependent on the CCN PAM sequence and protospacer selection is random and non-directional. These results suggested a regulation mechanism of CRISPR spacer acquisition where a single transcriptional regulator senses the presence...

  9. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    Science.gov (United States)

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  10. Construction of a genomic library of the human cytomegalovirus genome and analysis of late transcription of its inverted internal repeat region

    International Nuclear Information System (INIS)

    Silva, K.F.S.T.

    1989-01-01

    The investigations described in this dissertation were designed to determine the transcriptionally active DNA sequences of IIR region and to identify the viral mRNA transcribed from the transcriptionally most active DNA sequences of that region during late phase of HCMV Towne infection. Preliminary transcriptional studies which included the hybridization of a southern blot of XbaI digested entire HCMV genome to 32 P-labelled late phase infected cell A + RNA, indicated that late viral transcripts homologous to XbaI Q fragment of IIR region were very highly abundant while XbaI Q fragment showed a very low transcriptional activity. To facilitate further analysis of late transcription of IIR region, the entire DNA sequences of IIR region were molecularly cloned as U, S, and H BamHI fragments in pACYC-184 plasmid vector. In addition, to be used in future studies on other regions of the genome, except for y and c' smaller fragments the entire 240 kb HCMV genome was cloned as BamHI fragments in the same vector. Furthermore, the U, S, and H BamHI fragments were mapped with six other restriction enzymes in order to use that mapping data in subsequent transcriptional analysis of the IIR region. Further localization of transcriptionally active DNA sequences within IIR region was achieved by hybridization of southern blots of restricted U, S, and H BamHI fragments with 3' 32 P-labelled infected cell late A + RNA. The 1.5 kb EcooRI subfragments of S BamHI fragment and the adjoining 0.72 kb XhoI subfragment of H BamHI fragment revealed the highest level of transcription, although the remainder of the S fragment was also transcribed at a substantial level. The U fragment and the remainder of the H fragment was transcribed at a very low level

  11. A test of the transcription model for biased inheritance of yeast mitochondrial DNA.

    Science.gov (United States)

    Lorimer, H E; Brewer, B J; Fangman, W L

    1995-09-01

    Two strand-specific origins of replication appear to be required for mammalian mitochondrial DNA (mtDNA) replication. Structural equivalents of these origins are found in the rep sequences of Saccharomyces cerevisiae mtDNA. These striking similarities have contributed to a universal model for the initiation of mtDNA replication in which a primer is created by cleavage of an origin region transcript. Consistent with this model are the properties of deletion mutants of yeast mtDNA ([rho-]) with a high density of reps (HS [rho-]). These mutant mtDNAs are preferentially inherited by the progeny resulting from the mating of HS [rho-] cells with cells containing wild-type mtDNA ([rho+]). This bias is presumed to result from a replication advantage conferred on HS [rho-] mtDNA by the high density of rep sequences acting as origins. To test whether transcription is indeed required for the preferential inheritance of HS [rho-] mtDNA, we deleted the nuclear gene (RPO41) for the mitochondrial RNA polymerase, reducing transcripts by at least 1000-fold. Since [rho-] genomes, but not [rho+] genomes, are stable when RPO41 is deleted, we examined matings between HS [rho-] and neutral [rho-] cells. Neutral [rho-] mtDNAs lack rep sequences and are not preferentially inherited in [rho-] x [rho+] crosses. In HS [rho-] x neutral [rho-] matings, the HS [rho-] mtDNA was preferentially inherited whether both parents were wild type or both were deleted for RPO41. Thus, transcription from the rep promoter does not appear to be necessary for biased inheritance. Our results, and analysis of the literature, suggest that priming by transcription is not a universal mechanism for mtDNA replication initiation.

  12. Regulation, initiation, and termination of the cenA and cex transcripts of Cellulomonas fimi

    International Nuclear Information System (INIS)

    Greenberg, N.M.; Warren, R.A.J.; Kilburn, D.G.; Miller, R.C. Jr.

    1987-01-01

    The authors characterized the in vivo transcripts of two Cellulomonas fimi genes, which encodes an extracellular endo-β-1,4-glucanase. By Northern blot analysis, cenA mRNA was detected in C. fimi RNA preparations from glycerol- and carboxymethyl cellulose-grown cells but not from glucose-grown cells. In contrast, cex mRNA was detected only in the preparations from carboxymethyl cellulose-grown cells. Therefore, the transcription of these genes is subject to regulation by the carbon source provided to C. fimi. By nuclease SI protection studies with unique 5'-labeled DNA probes and C. fimi RNA isolated in vivo, 5' termini were found 51 and 62 bases before the cenA translational initiation codon and 28 bases before the cex translational initiation codon. S1 mapping with unlabeled DNA probes and C. fimi RNA which had been isolated in vivo but which had been 5' labeled in vitro with guanylyltransferase and [α- 32 P]GTP confirmed that true transcription initiation sites for cenA and cex mRNA had been identified. Comparative analysis of the DNA sequences immediately upstream of the initiation sites of the cenA and cex mRNAs revealed a 30-base-pair region where these two sequences display at least 66% homology. S1 mapping was also used to locate the 3' termini of the cenA and cex transcripts. Three 3' termini were found for cenA messages, whereas only one 3' terminus was identified for cex mRNA. The transcripts of both genes terminate in regions where their corresponding DNA sequences contain inverted repeats

  13. A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

    Directory of Open Access Journals (Sweden)

    Mickael Orgeur

    2018-01-01

    Full Text Available The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.

  14. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity

    DEFF Research Database (Denmark)

    Williams, Kristine; Christensen, Jesper; Pedersen, Marianne Terndrup

    2011-01-01

    a role in transcriptional repression. TET1 binds a significant proportion of Polycomb group target genes. Furthermore, TET1 associates and colocalizes with the SIN3A co-repressor complex. We propose that TET1 fine-tunes transcription, opposes aberrant DNA methylation at CpG-rich sequences and thereby...... throughout the genome of embryonic stem cells, with the majority of binding sites located at transcription start sites (TSSs) of CpG-rich promoters and within genes. The hmC modification is found in gene bodies and in contrast to mC is also enriched at CpG-rich TSSs. We provide evidence further that TET1 has...... contributes to the regulation of DNA methylation fidelity....

  15. Transcriptional activation signals found in the Epstein-Barr virus (EBV) latency C promoter are conserved in the latency C promoter sequences from baboon and Rhesus monkey EBV-like lymphocryptoviruses (cercopithicine herpesviruses 12 and 15).

    Science.gov (United States)

    Fuentes-Pananá, E M; Swaminathan, S; Ling, P D

    1999-01-01

    The Epstein-Barr virus (EBV) EBNA2 protein is a transcriptional activator that controls viral latent gene expression and is essential for EBV-driven B-cell immortalization. EBNA2 is expressed from the viral C promoter (Cp) and regulates its own expression by activating Cp through interaction with the cellular DNA binding protein CBF1. Through regulation of Cp and EBNA2 expression, EBV controls the pattern of latent protein expression and the type of latency established. To gain further insight into the important regulatory elements that modulate Cp usage, we isolated and sequenced the Cp regions corresponding to nucleotides 10251 to 11479 of the EBV genome (-1079 to +144 relative to the transcription initiation site) from the EBV-like lymphocryptoviruses found in baboons (herpesvirus papio; HVP) and Rhesus macaques (RhEBV). Sequence comparison of the approximately 1,230-bp Cp regions from these primate viruses revealed that EBV and HVP Cp sequences are 64% conserved, EBV and RhEBV Cp sequences are 66% conserved, and HVP and RhEBV Cp sequences are 65% conserved relative to each other. Approximately 50% of the residues are conserved among all three sequences, yet all three viruses have retained response elements for glucocorticoids, two positionally conserved CCAAT boxes, and positionally conserved TATA boxes. The putative EBNA2 100-bp enhancers within these promoters contain 54 conserved residues, and the binding sites for CBF1 and CBF2 are well conserved. Cp usage in the HVP- and RhEBV-transformed cell lines was detected by S1 nuclease protection analysis. Transient-transfection analysis showed that promoters of both HVP and RhEBV are responsive to EBNA2 and that they bind CBF1 and CBF2 in gel mobility shift assays. These results suggest that similar mechanisms for regulation of latent gene expression are conserved among the EBV-related lymphocryptoviruses found in nonhuman primates.

  16. Sialotranscriptomics of Rhipicephalus zambeziensis reveals intricate expression profiles of secretory proteins and suggests tight temporal transcriptional regulation during blood-feeding.

    Science.gov (United States)

    de Castro, Minique Hilda; de Klerk, Daniel; Pienaar, Ronel; Rees, D Jasper G; Mans, Ben J

    2017-08-10

    Ticks secrete a diverse mixture of secretory proteins into the host to evade its immune response and facilitate blood-feeding, making secretory proteins attractive targets for the production of recombinant anti-tick vaccines. The largely neglected tick species, Rhipicephalus zambeziensis, is an efficient vector of Theileria parva in southern Africa but its available sequence information is limited. Next generation sequencing has advanced sequence availability for ticks in recent years and has assisted the characterisation of secretory proteins. This study focused on the de novo assembly and annotation of the salivary gland transcriptome of R. zambeziensis and the temporal expression of secretory protein transcripts in female and male ticks, before the onset of feeding and during early and late feeding. The sialotranscriptome of R. zambeziensis yielded 23,631 transcripts from which 13,584 non-redundant proteins were predicted. Eighty-six percent of these contained a predicted start and stop codon and were estimated to be putatively full-length proteins. A fifth (2569) of the predicted proteins were annotated as putative secretory proteins and explained 52% of the expression in the transcriptome. Expression analyses revealed that 2832 transcripts were differentially expressed among feeding time points and 1209 between the tick sexes. The expression analyses further indicated that 57% of the annotated secretory protein transcripts were differentially expressed. Dynamic expression profiles of secretory protein transcripts were observed during feeding of female ticks. Whereby a number of transcripts were upregulated during early feeding, presumably for feeding site establishment and then during late feeding, 52% of these were downregulated, indicating that transcripts were required at specific feeding stages. This suggested that secretory proteins are under stringent transcriptional regulation that fine-tunes their expression in salivary glands during feeding. No open

  17. A pilot study of transcription unit analysis in rice using oligonucleotide tiling-path microarray

    DEFF Research Database (Denmark)

    Stolc, Viktor; Li, Lei; Wang, Xiangfeng

    2005-01-01

    As the international efforts to sequence the rice genome are completed, an immediate challenge and opportunity is to comprehensively and accurately define all transcription units in the rice genome. Here we describe a strategy of using high-density oligonucleotide tiling-path microarrays to map...... transcription of the japonica rice genome. In a pilot experiment to test this approach, one array representing the reverse strand of the last 11.2 Mb sequence of chromosome 10 was analyzed in detail based on a mathematical model developed in this study. Analysis of the array data detected 77% of the reference...... gene models in a mixture of four RNA populations. Moreover, significant transcriptional activities were found in many of the previously annotated intergenic regions. These preliminary results demonstrate the utility of genome tiling microarrays in evaluating annotated rice gene models...

  18. In vitro transcription in the presence of DNA oligonucleotides can generate strong anomalous initiation sites.

    Science.gov (United States)

    Chow, C W; Clark, M P; Rinaldo, J E; Chalkley, R

    1996-03-01

    In the present study, we have explored an unexpected observation in transcription initiation that is mediated by single-stranded oligonucleotides. Initially, our goal was to understand the function of different upstream regulatory elements/initiation sites in the rat xanthine dehydrogenase/oxidase (XDH/XO) promoter. We performed in vitro transcription with HeLa nuclear extracts in the presence of different double-stranded oligonucleotides against upstream elements as competitors. A new and unusual transcription initiation site was detected by primer extension. This new initiation site maps to the downstream region of the corresponding competitor. Subsequent analyses have indicated that the induction of a new transcription initiation site is anomalous which is due to the presence of a small amount of single-stranded oligonucleotide in the competitor. We found that this anomalous initiation site is insensitive to the orientation of the promoter and requires only a small amount of single-stranded oligonucleotide (< 2-fold molar excess relative to template). We surmise that a complementary interaction between the single-stranded oligonucleotide and transiently denatured promoter template may be responsible for this sequence-specific transcription initiation artifact. To study the regulation of transcription initiation by in vitro transcription approaches, we propose that one should probe the effect of removing transacting factors by adding an excess of a cognate oligonucleotide which does not bear exact sequence identity to the template.

  19. Analysis of convergent gene transcripts in the obligate intracellular bacterium Rickettsia prowazekii.

    Directory of Open Access Journals (Sweden)

    Andrew Woodard

    2011-01-01

    Full Text Available Termination of transcription is an important component of bacterial gene expression. However, little is known concerning this process in the obligate intracellular pathogen and model for reductive evolution, Rickettsia prowazekii. To assess transcriptional termination in this bacterium, transcripts of convergent gene pairs, some containing predicted intrinsic terminators, were analyzed. These analyses revealed that, rather than terminating at a specific site within the intervening region between the convergent genes, most of the transcripts demonstrated either a lack of termination within this region, which generated antisense RNA, or a putative non-site-specific termination that occurred throughout the intervening sequence. Transcripts terminating at predicted intrinsic terminators, as well as at a putative Rho-dependant terminator, were also examined and found to vary based on the rickettsial host environment. These results suggest that transcriptional termination, or lack thereof, plays a role in rickettsial gene regulation.

  20. High SINE RNA Expression Correlates with Post-Transcriptional Downregulation of BRCA1

    Directory of Open Access Journals (Sweden)

    Giovanni Bosco

    2013-04-01

    Full Text Available Short Interspersed Nuclear Elements (SINEs are non-autonomous retrotransposons that comprise a large fraction of the human genome. SINEs are demethylated in human disease, but whether SINEs become transcriptionally induced and how the resulting transcripts may affect the expression of protein coding genes is unknown. Here, we show that downregulation of the mRNA of the tumor suppressor gene BRCA1 is associated with increased transcription of SINEs and production of sense and antisense SINE small RNAs. We find that BRCA1 mRNA is post-transcriptionally down-regulated in a Dicer and Drosha dependent manner and that expression of a SINE inverted repeat with sequence identity to a BRCA1 intron is sufficient for downregulation of BRCA1 mRNA. These observations suggest that transcriptional activation of SINEs could contribute to a novel mechanism of RNA mediated post-transcriptional silencing of human genes.

  1. Outline of a genome navigation system based on the properties of GA-sequences and their flanks.

    Directory of Open Access Journals (Sweden)

    Guenter Albrecht-Buehler

    Full Text Available Introducing a new method to visualize large stretches of genomic DNA (see Appendix S1 the article reports that most GA-sequences [1] shared chains of tetra-GA-motifs and contained upstream poly(A-segments. Although not integral parts of them, Alu-elements were found immediately upstream of all human and chimpanzee GA-sequences with an upstream poly(A-segment. The article hypothesizes that genome navigation uses these properties of GA-sequences in the following way. (1 Poly(A binding proteins interact with the upstream poly(A-segments and arrange adjacent GA-sequences side-by-side ('GA-ribbon', while folding the intervening DNA sequences between them into loops ('associated DNA-loops'. (2 Genome navigation uses the GA-ribbon as a search path for specific target genes that is up to 730-fold shorter than the full-length chromosome. (3 As to the specificity of the search, each molecule of a target protein is assumed to catalyze the formation of specific oligomers from a set of transcription factors that recognize tetra-GA-motifs. Their specific combinations of tetra-GA motifs are assumed to be present in the particular GA-sequence whose associated loop contains the gene for the target protein. As long as the target protein is abundant in the cell it produces sufficient numbers of such oligomers which bind to their specific GA-sequences and, thereby, inhibit locally the transcription of the target protein in the associated loop. However, if the amount of target protein drops below a certain threshold, the resultant reduction of specific oligomers leaves the corresponding GA-sequence 'denuded'. In response, the associated DNA-loop releases its nucleosomes and allows transcription of the target protein to proceed. (4 The Alu-transcripts may help control the general background of protein synthesis proportional to the number of transcriptionally active associated loops, especially in stressed cells. (5 The model offers a new mechanism of co-regulation of

  2. Identification and complete sequencing of novel human transcripts through the use of mouse orthologs and testis cDNA sequences

    DEFF Research Database (Denmark)

    Ferreira, Elisa N; Pires, Lilian C; Parmigiani, Raphael B

    2004-01-01

    The correct identification of all human genes, and their derived transcripts, has not yet been achieved, and it remains one of the major aims of the worldwide genomics community. Computational programs suggest the existence of 30,000 to 40,000 human genes. However, definitive gene identification ...

  3. Prediction of nucleosome positioning based on transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Xianfu Yi

    Full Text Available BACKGROUND: The DNA of all eukaryotic organisms is packaged into nucleosomes, the basic repeating units of chromatin. The nucleosome consists of a histone octamer around which a DNA core is wrapped and the linker histone H1, which is associated with linker DNA. By altering the accessibility of DNA sequences, the nucleosome has profound effects on all DNA-dependent processes. Understanding the factors that influence nucleosome positioning is of great importance for the study of genomic control mechanisms. Transcription factors (TFs have been suggested to play a role in nucleosome positioning in vivo. PRINCIPAL FINDINGS: Here, the minimum redundancy maximum relevance (mRMR feature selection algorithm, the nearest neighbor algorithm (NNA, and the incremental feature selection (IFS method were used to identify the most important TFs that either favor or inhibit nucleosome positioning by analyzing the numbers of transcription factor binding sites (TFBSs in 53,021 nucleosomal DNA sequences and 50,299 linker DNA sequences. A total of nine important families of TFs were extracted from 35 families, and the overall prediction accuracy was 87.4% as evaluated by the jackknife cross-validation test. CONCLUSIONS: Our results are consistent with the notion that TFs are more likely to bind linker DNA sequences than the sequences in the nucleosomes. In addition, our results imply that there may be some TFs that are important for nucleosome positioning but that play an insignificant role in discriminating nucleosome-forming DNA sequences from nucleosome-inhibiting DNA sequences. The hypothesis that TFs play a role in nucleosome positioning is, thus, confirmed by the results of this study.

  4. Unique CCT repeats mediate transcription of the TWIST1 gene in mesenchymal cell lines

    International Nuclear Information System (INIS)

    Ohkuma, Mizue; Funato, Noriko; Higashihori, Norihisa; Murakami, Masanori; Ohyama, Kimie; Nakamura, Masataka

    2007-01-01

    TWIST1, a basic helix-loop-helix transcription factor, plays critical roles in embryo development, cancer metastasis and mesenchymal progenitor differentiation. Little is known about transcriptional regulation of TWIST1 expression. Here we identified DNA sequences responsible for TWIST1 expression in mesenchymal lineage cell lines. Reporter assays with TWIST1 promoter mutants defined the -102 to -74 sequences that are essential for TWIST1 expression in human and mouse mesenchymal cell lines. Tandem repeats of CCT, but not putative CREB and NF-κB sites in the sequences substantially supported activity of the TWIST1 promoter. Electrophoretic mobility shift assay demonstrated that the DNA sequences with the CCT repeats formed complexes with nuclear factors, containing, at least, Sp1 and Sp3. These results suggest critical implication of the CCT repeats in association with Sp1 and Sp3 factors in sustaining expression of the TWIST1 gene in mesenchymal cells

  5. PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.

    Science.gov (United States)

    Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng

    2018-05-01

    The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.

  6. Whole transcriptome analysis using next-generation sequencing of model species Setaria viridis to support C4 photosynthesis research.

    Science.gov (United States)

    Xu, Jiajia; Li, Yuanyuan; Ma, Xiuling; Ding, Jianfeng; Wang, Kai; Wang, Sisi; Tian, Ye; Zhang, Hui; Zhu, Xin-Guang

    2013-09-01

    Setaria viridis is an emerging model species for genetic studies of C4 photosynthesis. Many basic molecular resources need to be developed to support for this species. In this paper, we performed a comprehensive transcriptome analysis from multiple developmental stages and tissues of S. viridis using next-generation sequencing technologies. Sequencing of the transcriptome from multiple tissues across three developmental stages (seed germination, vegetative growth, and reproduction) yielded a total of 71 million single end 100 bp long reads. Reference-based assembly using Setaria italica genome as a reference generated 42,754 transcripts. De novo assembly generated 60,751 transcripts. In addition, 9,576 and 7,056 potential simple sequence repeats (SSRs) covering S. viridis genome were identified when using the reference based assembled transcripts and the de novo assembled transcripts, respectively. This identified transcripts and SSR provided by this study can be used for both reverse and forward genetic studies based on S. viridis.

  7. Deleterious ABCA7 mutations and transcript rescue mechanisms in early onset Alzheimer's disease.

    Science.gov (United States)

    De Roeck, Arne; Van den Bossche, Tobi; van der Zee, Julie; Verheijen, Jan; De Coster, Wouter; Van Dongen, Jasper; Dillen, Lubina; Baradaran-Heravi, Yalda; Heeman, Bavo; Sanchez-Valle, Raquel; Lladó, Albert; Nacmias, Benedetta; Sorbi, Sandro; Gelpi, Ellen; Grau-Rivera, Oriol; Gómez-Tortosa, Estrella; Pastor, Pau; Ortega-Cubero, Sara; Pastor, Maria A; Graff, Caroline; Thonberg, Håkan; Benussi, Luisa; Ghidoni, Roberta; Binetti, Giuliano; de Mendonça, Alexandre; Martins, Madalena; Borroni, Barbara; Padovani, Alessandro; Almeida, Maria Rosário; Santana, Isabel; Diehl-Schmid, Janine; Alexopoulos, Panagiotis; Clarimon, Jordi; Lleó, Alberto; Fortea, Juan; Tsolaki, Magda; Koutroumani, Maria; Matěj, Radoslav; Rohan, Zdenek; De Deyn, Peter; Engelborghs, Sebastiaan; Cras, Patrick; Van Broeckhoven, Christine; Sleegers, Kristel

    2017-09-01

    Premature termination codon (PTC) mutations in the ATP-Binding Cassette, Sub-Family A, Member 7 gene (ABCA7) have recently been identified as intermediate-to-high penetrant risk factor for late-onset Alzheimer's disease (LOAD). High variability, however, is observed in downstream ABCA7 mRNA and protein expression, disease penetrance, and onset age, indicative of unknown modifying factors. Here, we investigated the prevalence and disease penetrance of ABCA7 PTC mutations in a large early onset AD (EOAD)-control cohort, and examined the effect on transcript level with comprehensive third-generation long-read sequencing. We characterized the ABCA7 coding sequence with next-generation sequencing in 928 EOAD patients and 980 matched control individuals. With MetaSKAT rare variant association analysis, we observed a fivefold enrichment (p = 0.0004) of PTC mutations in EOAD patients (3%) versus controls (0.6%). Ten novel PTC mutations were only observed in patients, and PTC mutation carriers in general had an increased familial AD load. In addition, we observed nominal risk reducing trends for three common coding variants. Seven PTC mutations were further analyzed using targeted long-read cDNA sequencing on an Oxford Nanopore MinION platform. PTC-containing transcripts for each investigated PTC mutation were observed at varying proportion (5-41% of the total read count), implying incomplete nonsense-mediated mRNA decay (NMD). Furthermore, we distinguished and phased several previously unknown alternative splicing events (up to 30% of transcripts). In conjunction with PTC mutations, several of these novel ABCA7 isoforms have the potential to rescue deleterious PTC effects. In conclusion, ABCA7 PTC mutations play a substantial role in EOAD, warranting genetic screening of ABCA7 in genetically unexplained patients. Long-read cDNA sequencing revealed both varying degrees of NMD and transcript-modifying events, which may influence ABCA7 dosage, disease severity, and may

  8. Characteristics of functional enrichment and gene expression level of human putative transcriptional target genes.

    Science.gov (United States)

    Osato, Naoki

    2018-01-19

    Transcriptional target genes show functional enrichment of genes. However, how many and how significantly transcriptional target genes include functional enrichments are still unclear. To address these issues, I predicted human transcriptional target genes using open chromatin regions, ChIP-seq data and DNA binding sequences of transcription factors in databases, and examined functional enrichment and gene expression level of putative transcriptional target genes. Gene Ontology annotations showed four times larger numbers of functional enrichments in putative transcriptional target genes than gene expression information alone, independent of transcriptional target genes. To compare the number of functional enrichments of putative transcriptional target genes between cells or search conditions, I normalized the number of functional enrichment by calculating its ratios in the total number of transcriptional target genes. With this analysis, native putative transcriptional target genes showed the largest normalized number of functional enrichments, compared with target genes including 5-60% of randomly selected genes. The normalized number of functional enrichments was changed according to the criteria of enhancer-promoter interactions such as distance from transcriptional start sites and orientation of CTCF-binding sites. Forward-reverse orientation of CTCF-binding sites showed significantly higher normalized number of functional enrichments than the other orientations. Journal papers showed that the top five frequent functional enrichments were related to the cellular functions in the three cell types. The median expression level of transcriptional target genes changed according to the criteria of enhancer-promoter assignments (i.e. interactions) and was correlated with the changes of the normalized number of functional enrichments of transcriptional target genes. Human putative transcriptional target genes showed significant functional enrichments. Functional

  9. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  10. EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

    Science.gov (United States)

    Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J

    2015-09-03

    RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

  11. Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs.

    Science.gov (United States)

    Chávez Montes, Ricardo A; de Fátima Rosas-Cárdenas, Flor; De Paoli, Emanuele; Accerbi, Monica; Rymarquis, Linda A; Mahalingam, Gayathri; Marsch-Martínez, Nayelli; Meyers, Blake C; Green, Pamela J; de Folter, Stefan

    2014-04-23

    Small RNAs are pivotal regulators of gene expression that guide transcriptional and post-transcriptional silencing mechanisms in eukaryotes, including plants. Here we report a comprehensive atlas of sRNA and miRNA from 3 species of algae and 31 representative species across vascular plants, including non-model plants. We sequence and quantify sRNAs from 99 different tissues or treatments across species, resulting in a data set of over 132 million distinct sequences. Using miRBase mature sequences as a reference, we identify the miRNA sequences present in these libraries. We apply diverse profiling methods to examine critical sRNA and miRNA features, such as size distribution, tissue-specific regulation and sequence conservation between species, as well as to predict putative new miRNA sequences. We also develop database resources, computational analysis tools and a dedicated website, http://smallrna.udel.edu/. This study provides new insights on plant sRNAs and miRNAs, and a foundation for future studies.

  12. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    Science.gov (United States)

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  13. An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.

    Science.gov (United States)

    Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D

    2017-01-01

    We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.

  14. Next-generation sequencing library preparation method for identification of RNA viruses on the Ion Torrent Sequencing Platform.

    Science.gov (United States)

    Chen, Guiqian; Qiu, Yuan; Zhuang, Qingye; Wang, Suchun; Wang, Tong; Chen, Jiming; Wang, Kaicheng

    2018-05-09

    Next generation sequencing (NGS) is a powerful tool for the characterization, discovery, and molecular identification of RNA viruses. There were multiple NGS library preparation methods published for strand-specific RNA-seq, but some methods are not suitable for identifying and characterizing RNA viruses. In this study, we report a NGS library preparation method to identify RNA viruses using the Ion Torrent PGM platform. The NGS sequencing adapters were directly inserted into the sequencing library through reverse transcription and polymerase chain reaction, without fragmentation and ligation of nucleic acids. The results show that this method is simple to perform, able to identify multiple species of RNA viruses in clinical samples.

  15. Targeted HIV-1 Latency Reversal Using CRISPR/Cas9-Derived Transcriptional Activator Systems.

    Directory of Open Access Journals (Sweden)

    Julia K Bialek

    Full Text Available CRISPR/Cas9 technology is currently considered the most advanced tool for targeted genome engineering. Its sequence-dependent specificity has been explored for locus-directed transcriptional modulation. Such modulation, in particular transcriptional activation, has been proposed as key approach to overcome silencing of dormant HIV provirus in latently infected cellular reservoirs. Currently available agents for provirus activation, so-called latency reversing agents (LRAs, act indirectly through cellular pathways to induce viral transcription. However, their clinical performance remains suboptimal, possibly because reservoirs have diverse cellular identities and/or proviral DNA is intractable to the induced pathways. We have explored two CRISPR/Cas9-derived activator systems as targeted approaches to induce dormant HIV-1 proviral DNA. These systems recruit multiple transcriptional activation domains to the HIV 5' long terminal repeat (LTR, for which we have identified an optimal target region within the LTR U3 sequence. Using this target region, we demonstrate transcriptional activation of proviral genomes via the synergistic activation mediator complex in various in culture model systems for HIV latency. Observed levels of induction are comparable or indeed higher than treatment with established LRAs. Importantly, activation is complete, leading to production of infective viral particles. Our data demonstrate that CRISPR/Cas9-derived technologies can be applied to counteract HIV latency and may therefore represent promising novel approaches in the quest for HIV elimination.

  16. Automatic discovery of cross-family sequence features associated with protein function

    Directory of Open Access Journals (Sweden)

    Krings Andrea

    2006-01-01

    Full Text Available Abstract Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for

  17. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models

    KAUST Repository

    Kulakovskiy, Ivan V.

    2015-11-19

    Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.

  18. Transcriptional regulation of the tyrosine hydroxylase gene by glucocorticoid and cyclic AMP

    International Nuclear Information System (INIS)

    Lewis, E.J.; Harrington, C.A.; Chikaraishi, D.M.

    1987-01-01

    Glucocorticoid and cyclic AMP increase tyrosine hydroxylase (TH) activity and mRNA levels in pheochromocytoma cultures. The transcriptional activity of the TH gene, as measured by nuclear run-on assay, is also increased when cultures are treated with the synthetic glucocorticoid dexamethasone or agents that increase intracellular cyclic AMP, such as forskolin and 8-BrcAMP. Both inducers effect transcriptional changes within 10 min after treatment and are maximal after 30 min for forskolin and after 60 min for dexamethasone. The 5' flanking sequences of the TH gene were fused to the bacterial gene chloramphenicol acetyltransferase (CAT), and the hybrid gene was transfected into pheochromocytoma cultures and GH 4 pituitary cells. In both cell lines, a region of the TH gene containing bases -272 to +27 conferred induction of CAT by cyclic AMP, but not by glucocorticoid. The same results were found when a region of the TH gene containing -773 to + 27 was used. Thus, the sequences required for induction of TH by cyclic AMP are contained within 272 bases of 5' flanking sequence, but sequences sufficient for glucocorticoid regulation are not contained with 773 bases

  19. Transcription Factor Functional Protein-Protein Interactions in Plant Defense Responses

    Directory of Open Access Journals (Sweden)

    Murilo S. Alves

    2014-03-01

    Full Text Available Responses to biotic stress in plants lead to dramatic reprogramming of gene expression, favoring stress responses at the expense of normal cellular functions. Transcription factors are master regulators of gene expression at the transcriptional level, and controlling the activity of these factors alters the transcriptome of the plant, leading to metabolic and phenotypic changes in response to stress. The functional analysis of interactions between transcription factors and other proteins is very important for elucidating the role of these transcriptional regulators in different signaling cascades. In this review, we present an overview of protein-protein interactions for the six major families of transcription factors involved in plant defense: basic leucine zipper containing domain proteins (bZIP, amino-acid sequence WRKYGQK (WRKY, myelocytomatosis related proteins (MYC, myeloblastosis related proteins (MYB, APETALA2/ ETHYLENE-RESPONSIVE ELEMENT BINDING FACTORS (AP2/EREBP and no apical meristem (NAM, Arabidopsis transcription activation factor (ATAF, and cup-shaped cotyledon (CUC (NAC. We describe the interaction partners of these transcription factors as molecular responses during pathogen attack and the key components of signal transduction pathways that take place during plant defense responses. These interactions determine the activation or repression of response pathways and are crucial to understanding the regulatory networks that modulate plant defense responses.

  20. Transcriptional profiling in human HaCaT keratinocytes in response to kaempferol and identification of potential transcription factors for regulating differential gene expression

    Science.gov (United States)

    Kang, Byung Young; Lee, Ki-Hwan; Lee, Yong Sung; Hong, Il; Lee, Mi-Ock; Min, Daejin; Chang, Ihseop; Hwang, Jae Sung; Park, Jun Seong; Kim, Duck Hee

    2008-01-01

    Kaempferol is the major flavonol in green tea and exhibits many biomedically useful properties such as antioxidative, cytoprotective and anti-apoptotic activities. To elucidate its effects on the skin, we investigated the transcriptional profiles of kaempferol-treated HaCaT cells using cDNA microarray analysis and identified 147 transcripts that exhibited significant changes in expression. Of these, 18 were up-regulated and 129 were down-regulated. These transcripts were then classified into 12 categories according to their functional roles: cell adhesion/cytoskeleton, cell cycle, redox homeostasis, immune/defense responses, metabolism, protein biosynthesis/modification, intracellular transport, RNA processing, DNA modification/ replication, regulation of transcription, signal transduction and transport. We then analyzed the promoter sequences of differentially-regulated genes and identified over-represented regulatory sites and candidate transcription factors (TFs) for gene regulation by kaempferol. These included c-REL, SAP-1, Ahr-ARNT, Nrf-2, Elk-1, SPI-B, NF-κB and p65. In addition, we validated the microarray results and promoter analyses using conventional methods such as real-time PCR and ELISA-based transcription factor assay. Our microarray analysis has provided useful information for determining the genetic regulatory network affected by kaempferol, and this approach will be useful for elucidating gene-phytochemical interactions. PMID:18446059

  1. Efficiency of Transcription from Promoter Sequence Variants in Lactobacillus Is Both Strain and Context Dependent

    OpenAIRE

    McCracken, Andrea; Timms, Peter

    1999-01-01

    The introduction of consensus −35 (TTGACA) and −10 (TATAAT) hexamers and a TG motif into the Lactobacillus acidophilus ATCC 4356 wild-type slpA promoter resulted in significant improvements (4.3-, 4.1-, and 10.7-fold, respectively) in transcriptional activity in Lactobacillus fermentum BR11. In contrast, the same changes resulted in decreased transcription in Lactobacillus rhamnosus GG. The TG motif was shown to be important in the context of weak −35 and −10 hexamers (L. fermentum BR11) or a...

  2. Protein-protein interactions in the regulation of WRKY transcription factors.

    Science.gov (United States)

    Chi, Yingjun; Yang, Yan; Zhou, Yuan; Zhou, Jie; Fan, Baofang; Yu, Jing-Quan; Chen, Zhixiang

    2013-03-01

    It has been almost 20 years since the first report of a WRKY transcription factor, SPF1, from sweet potato. Great progress has been made since then in establishing the diverse biological roles of WRKY transcription factors in plant growth, development, and responses to biotic and abiotic stress. Despite the functional diversity, almost all analyzed WRKY proteins recognize the TTGACC/T W-box sequences and, therefore, mechanisms other than mere recognition of the core W-box promoter elements are necessary to achieve the regulatory specificity of WRKY transcription factors. Research over the past several years has revealed that WRKY transcription factors physically interact with a wide range of proteins with roles in signaling, transcription, and chromatin remodeling. Studies of WRKY-interacting proteins have provided important insights into the regulation and mode of action of members of the important family of transcription factors. It has also emerged that the slightly varied WRKY domains and other protein motifs conserved within each of the seven WRKY subfamilies participate in protein-protein interactions and mediate complex functional interactions between WRKY proteins and between WRKY and other regulatory proteins in the modulation of important biological processes. In this review, we summarize studies of protein-protein interactions for WRKY transcription factors and discuss how the interacting partners contribute, at different levels, to the establishment of the complex regulatory and functional network of WRKY transcription factors.

  3. Analysis of expressed sequence tags of the cyclically parthenogenetic rotifer Brachionus plicatilis.

    Directory of Open Access Journals (Sweden)

    Koushirou Suga

    Full Text Available BACKGROUND: Rotifers are among the most common non-arthropod animals and are the most experimentally tractable members of the basal assemblage of metazoan phyla known as Gnathifera. The monogonont rotifer Brachionus plicatilis is a developing model system for ecotoxicology, aquatic ecology, cryptic speciation, and the evolution of sex, and is an important food source for finfish aquaculture. However, basic knowledge of the genome and transcriptome of any rotifer species has been lacking. METHODOLOGY/PRINCIPAL FINDINGS: We generated and partially sequenced a cDNA library from B. plicatilis and constructed a database of over 2300 expressed sequence tags corresponding to more than 450 transcripts. About 20% of the transcripts had no significant similarity to database sequences by BLAST; most of these contained open reading frames of significant length but few had recognized Pfam motifs. Sixteen transcripts accounted for 25% of the ESTs; four of these had no significant similarity to BLAST or Pfam databases. Putative up- and downstream untranslated regions are relatively short and AT rich. In contrast to bdelloid rotifers, there was no evidence of a conserved trans-spliced leader sequence among the transcripts and most genes were single-copy. CONCLUSIONS/SIGNIFICANCE: Despite the small size of this EST project it revealed several important features of the rotifer transcriptome and of individual monogonont genes. Because there is little genomic data for Gnathifera, the transcripts we found with no known function may represent genes that are species-, class-, phylum- or even superphylum-specific; the fact that some are among the most highly expressed indicates their importance. The absence of trans-spliced leader exons in this monogonont species contrasts with their abundance in bdelloid rotifers and indicates that the presence of this phenomenon can vary at the subphylum level. Our EST database provides a relatively large quantity of transcript

  4. Analysis of expressed sequence tags of the cyclically parthenogenetic rotifer Brachionus plicatilis.

    Science.gov (United States)

    Suga, Koushirou; Welch, David Mark; Tanaka, Yukari; Sakakura, Yoshitaka; Hagiwara, Atsushi

    2007-08-01

    Rotifers are among the most common non-arthropod animals and are the most experimentally tractable members of the basal assemblage of metazoan phyla known as Gnathifera. The monogonont rotifer Brachionus plicatilis is a developing model system for ecotoxicology, aquatic ecology, cryptic speciation, and the evolution of sex, and is an important food source for finfish aquaculture. However, basic knowledge of the genome and transcriptome of any rotifer species has been lacking. We generated and partially sequenced a cDNA library from B. plicatilis and constructed a database of over 2300 expressed sequence tags corresponding to more than 450 transcripts. About 20% of the transcripts had no significant similarity to database sequences by BLAST; most of these contained open reading frames of significant length but few had recognized Pfam motifs. Sixteen transcripts accounted for 25% of the ESTs; four of these had no significant similarity to BLAST or Pfam databases. Putative up- and downstream untranslated regions are relatively short and AT rich. In contrast to bdelloid rotifers, there was no evidence of a conserved trans-spliced leader sequence among the transcripts and most genes were single-copy. Despite the small size of this EST project it revealed several important features of the rotifer transcriptome and of individual monogonont genes. Because there is little genomic data for Gnathifera, the transcripts we found with no known function may represent genes that are species-, class-, phylum- or even superphylum-specific; the fact that some are among the most highly expressed indicates their importance. The absence of trans-spliced leader exons in this monogonont species contrasts with their abundance in bdelloid rotifers and indicates that the presence of this phenomenon can vary at the subphylum level. Our EST database provides a relatively large quantity of transcript-level data for B. plicatilis, and more generally of rotifers and other gnathiferan phyla, and

  5. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  6. Genomic context drives transcription of insertion sequences in the bacterial endosymbiont Wolbachia wVulC.

    Science.gov (United States)

    Cerveau, Nicolas; Gilbert, Clément; Liu, Chao; Garrett, Roger A; Grève, Pierre; Bouchon, Didier; Cordaux, Richard

    2015-06-10

    Transposable elements (TEs) are DNA pieces that are present in almost all the living world at variable genomic density. Due to their mobility and density, TEs are involved in a large array of genomic modifications. In eukaryotes, TE expression has been studied in detail in several species. In prokaryotes, studies of IS expression are generally linked to particular copies that induce a modification of neighboring gene expression. Here we investigated global patterns of IS transcription in the Alphaproteobacterial endosymbiont Wolbachia wVulC, using both RT-PCR and bioinformatic analyses. We detected several transcriptional promoters in all IS groups. Nevertheless, only one of the potentially functional IS groups possesses a promoter located upstream of the transposase gene, that could lead up to the production of a functional protein. We found that the majority of IS groups are expressed whatever their functional status. RT-PCR analyses indicate that the transcription of two IS groups lacking internal promoters upstream of the transposase start codon may be driven by the genomic environment. We confirmed this observation with the transcription analysis of individual copies of one IS group. These results suggest that the genomic environment is important for IS expression and it could explain, at least partly, copy number variability of the various IS groups present in the wVulC genome and, more generally, in bacterial genomes. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. A damage-responsive DNA binding protein regulates transcription of the yeast DNA repair gene PHR1

    International Nuclear Information System (INIS)

    Sebastian, J.; Sancar, G.B.

    1991-01-01

    The PHR1 gene of Saccharomyces cerevisiae encodes the DNA repair enzyme photolyase. Transcription of PHR1 increases in response to treatment of cells with 254-nm radiation and chemical agents that damage DNA. The authors here the identification of a damage-responsive DNA binding protein, termed photolyase regulatory protein (PRP), and its cognate binding site, termed the PHR1 transcription after DNA damage. PRP activity, monitored by electrophoretic-mobility-shift assay, was detected in cells during normal growth but disappeared within 30 min after irradiation. Copper-phenanthroline footprinting of PRP-DNA complexes revealed that PRP protects a 39-base-pair region of PHR1 5' flanking sequence beginning 40 base pairs upstream from the coding sequence. Thus these observations establish that PRP is a damage-responsive repressor of PHR1 transcription

  8. Genome-wide profiling of H3K56 acetylation and transcription factor binding sites in human adipocytes.

    Directory of Open Access Journals (Sweden)

    Kinyui Alice Lo

    Full Text Available The growing epidemic of obesity and metabolic diseases calls for a better understanding of adipocyte biology. The regulation of transcription in adipocytes is particularly important, as it is a target for several therapeutic approaches. Transcriptional outcomes are influenced by both histone modifications and transcription factor binding. Although the epigenetic states and binding sites of several important transcription factors have been profiled in the mouse 3T3-L1 cell line, such data are lacking in human adipocytes. In this study, we identified H3K56 acetylation sites in human adipocytes derived from mesenchymal stem cells. H3K56 is acetylated by CBP and p300, and deacetylated by SIRT1, all are proteins with important roles in diabetes and insulin signaling. We found that while almost half of the genome shows signs of H3K56 acetylation, the highest level of H3K56 acetylation is associated with transcription factors and proteins in the adipokine signaling and Type II Diabetes pathways. In order to discover the transcription factors that recruit acetyltransferases and deacetylases to sites of H3K56 acetylation, we analyzed DNA sequences near H3K56 acetylated regions and found that the E2F recognition sequence was enriched. Using chromatin immunoprecipitation followed by high-throughput sequencing, we confirmed that genes bound by E2F4, as well as those by HSF-1 and C/EBPα, have higher than expected levels of H3K56 acetylation, and that the transcription factor binding sites and acetylation sites are often adjacent but rarely overlap. We also discovered a significant difference between bound targets of C/EBPα in 3T3-L1 and human adipocytes, highlighting the need to construct species-specific epigenetic and transcription factor binding site maps. This is the first genome-wide profile of H3K56 acetylation, E2F4, C/EBPα and HSF-1 binding in human adipocytes, and will serve as an important resource for better understanding adipocyte

  9. Transcriptional Responses in root and leaf of Prunus persica Under Drought Stress Using RNA Sequencing

    Directory of Open Access Journals (Sweden)

    Najla Ksouri

    2016-11-01

    Full Text Available Prunus persica L. Batch, or peach, is one of the most important crops and it is widely established in irrigated arid and semi-arid regions. However, due to variations in the climate and the increased aridity, drought has become a major constraint, causing crop losses worldwide. The use of drought-tolerant rootstocks in modern fruit production appears to be a useful method of alleviating water deficit problems. However, the transcriptomic variation and the major molecular mechanisms that underlie the adaptation of drought-tolerant rootstocks to water shortage remain unclear. Hence, in this study, high-throughput sequencing (RNA-seq was performed to assess the transcriptomic changes and the key genes involved in the response to drought in root tissues (GF677 rootstock and leaf tissues (graft, var. Catherina subjected to 16 days of drought stress. In total, 12 RNA libraries were constructed and sequenced. This generated a total of 315M raw reads from both tissues, which allowed the assembly of 22,079 and 17,854 genes associated with the root and leaf tissues, respectively. Subsets of 500 differentially expressed genes (DEGs in roots and 236 in leaves were identified and functionally annotated with 56 gene ontology (GO terms and 99 metabolic pathways, which were mostly associated with aminobenzoate degradation and phenylpropanoid biosynthesis. The GO analysis highlighted the biological functions that were exclusive to the root tissue, such as locomotion, hormone metabolic process, and detection of stimulus, indicating the stress-buffering role of the GF677 rootstock. Furthermore, the complex regulatory network involved in the drought response was revealed, involving proteins that are associated with signaling transduction, transcription and hormone regulation, redox homeostasis, and frontline barriers. We identified two poorly characterized genes in P. persica: growth-regulating factor 5 (GRF5, which may be involved in cellular expansion, and AtHB12

  10. Identification, variation and transcription of pneumococcal repeat sequences

    Science.gov (United States)

    2011-01-01

    Background Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics. Results Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR. Conclusions BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/. PMID:21333003

  11. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    Science.gov (United States)

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  12. Spectrometric study of the folding process of i-motif-forming DNA sequences upstream of the c-kit transcription initiation site

    International Nuclear Information System (INIS)

    Bucek, Pavel; Gargallo, Raimundo; Kudrev, Andrei

    2010-01-01

    The c-kit oncogene shows a cytosine-rich DNA region upstream of the transcription initiation site which forms an i-motif structure at slightly acidic pH values (Bucek et al. ). In the present study, the pH-induced formation of i-motif - forming sequences 5'-CCC CTC CCT CGC GCC CGC CCG-3' (ckitC1, native), 5'-CCC TTC CCT TGT GCC CGC CCG-3' (ckitC2) and 5'-CCCTT CCC TTTTT CCC T CCC T-3' (ckitC3) was studied by spectroscopic techniques, such as UV molecular absorption and circular dichroism (CD), in tandem with two multivariate data analysis methods, the hard modelling-based matrix method and the soft modelling-based MCR-ALS approach. Use of the hard chemical modelling enabled us to propose the equilibrium model, which describes spectral changes as functions of solution acidity. Additionally, the intrinsic protonation constant, K in , and the cooperativity parameters, ω c , and ω a , were calculated from the fitting procedure of the coupled CD and molecular absorption spectra. In the case of ckitC2 and ckitC3, the hard model correctly reproduced the spectral variations observed experimentally. The results indicated that folding was accompanied by a cooperative process, i.e. the enhancement of protonated structure stability upon protonation. In contrast, unfolding was accompanied by an anticooperative process. Finally, folding of the native sequence, ckitC1, seemed to follow a more complex mechanism.

  13. High-throughput verification of transcriptional starting sites by Deep-RACE

    DEFF Research Database (Denmark)

    Olivarius, Signe; Plessy, Charles; Carninci, Piero

    2009-01-01

    We present a high-throughput method for investigating the transcriptional starting sites of genes of interest, which we named Deep-RACE (Deep–rapid amplification of cDNA ends). Taking advantage of the latest sequencing technology, it allows the parallel analysis of multiple genes and is free...

  14. Transcriptionally Active Heterochromatin in Rye B Chromosomes[W

    Science.gov (United States)

    Carchilan, Mariana; Delgado, Margarida; Ribeiro, Teresa; Costa-Nunes, Pedro; Caperta, Ana; Morais-Cecílio, Leonor; Jones, R. Neil; Viegas, Wanda; Houben, Andreas

    2007-01-01

    B chromosomes (Bs) are dispensable components of the genomes of numerous species. Thus far, there is a lack of evidence for any transcripts of Bs in plants, with the exception of some rDNA sequences. Here, we show that the Giemsa banding-positive heterochromatic subterminal domain of rye (Secale cereale) Bs undergoes decondensation during interphase. Contrary to the heterochromatic regions of A chromosomes, this domain is simultaneously marked by trimethylated H3K4 and by trimethylated H3K27, an unusual combination of apparently conflicting histone modifications. Notably, both types of B-specific high copy repeat families (E3900 and D1100) of the subterminal domain are transcriptionally active, although with different tissue type–dependent activity. No small RNAs were detected specifically for the presence of Bs. The lack of any significant open reading frame and the highly heterogeneous size of mainly polyadenylated transcripts indicate that the noncoding RNA may function as structural or catalytic RNA. PMID:17586652

  15. Transcription of tandemly repetitive DNA: functional roles.

    Science.gov (United States)

    Biscotti, Maria Assunta; Canapa, Adriana; Forconi, Mariko; Olmo, Ettore; Barucca, Marco

    2015-09-01

    A considerable fraction of the eukaryotic genome is made up of satellite DNA constituted of tandemly repeated sequences. These elements are mainly located at centromeres, pericentromeres, and telomeres and are major components of constitutive heterochromatin. Although originally satellite DNA was thought silent and inert, an increasing number of studies are providing evidence on its transcriptional activity supporting, on the contrary, an unexpected dynamicity. This review summarizes the multiple structural roles of satellite noncoding RNAs at chromosome level. Indeed, satellite noncoding RNAs play a role in the establishment of a heterochromatic state at centromere and telomere. These highly condensed structures are indispensable to preserve chromosome integrity and genome stability, preventing recombination events, and ensuring the correct chromosome pairing and segregation. Moreover, these RNA molecules seem to be involved also in maintaining centromere identity and in elongation, capping, and replication of telomere. Finally, the abnormal variation of centromeric and pericentromeric DNA transcription across major eukaryotic lineages in stress condition and disease has evidenced the critical role that these transcripts may play and the potentially dire consequences for the organism.

  16. Sequence2Vec: A novel embedding approach for modeling transcription factor binding affinity landscape

    KAUST Repository

    Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin

    2017-01-01

    Motivation: An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have

  17. Strand-specific RNA-seq reveals widespread occurrence of novel cis-natural antisense transcripts in rice

    Directory of Open Access Journals (Sweden)

    Lu Tingting

    2012-12-01

    Full Text Available Abstract Background Cis-natural antisense transcripts (cis-NATs are RNAs transcribed from the antisense strand of a gene locus, and are complementary to the RNA transcribed from the sense strand. Common techniques including microarray approach and analysis of transcriptome databases are the major ways to globally identify cis-NATs in various eukaryotic organisms. Genome-wide in silico analysis has identified a large number of cis-NATs that may generate endogenous short interfering RNAs (nat-siRNAs, which participate in important biogenesis mechanisms for transcriptional and post-transcriptional regulation in rice. However, the transcriptomes are yet to be deeply sequenced to comprehensively investigate cis-NATs. Results We applied high-throughput strand-specific complementary DNA sequencing technology (ssRNA-seq to deeply sequence mRNA for assessing sense and antisense transcripts that were derived under salt, drought and cold stresses, and normal conditions, in the model plant rice (Oryza sativa. Combined with RAP-DB genome annotation (the Rice Annotation Project Database build-5 data set, 76,013 transcripts corresponding to 45,844 unique gene loci were assembled, in which 4873 gene loci were newly identified. Of 3819 putative rice cis-NATs, 2292 were detected as expressed and giving rise to small RNAs from their overlapping regions through integrated analysis of ssRNA-seq data and small RNA data. Among them, 503 cis-NATs seemed to be associated with specific conditions. The deep sequence data from isolated epidermal cells of rice seedlings further showed that 54.0% of cis-NATs were expressed simultaneously in a population of homogenous cells. Nearly 9.7% of rice transcripts were involved in one-to-one or many-to-many cis-NATs formation. Furthermore, only 17.4-34.7% of 223 many-to-many cis-NAT groups were all expressed and generated nat-siRNAs, indicating that only some cis-NAT groups may be involved in complex regulatory networks. Conclusions

  18. Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms.

    Science.gov (United States)

    Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli

    2012-01-01

    RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts.

  19. Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms.

    Directory of Open Access Journals (Sweden)

    Sara Kangaspeska

    Full Text Available RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60% of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts.

  20. Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing.

    Science.gov (United States)

    Conway, Tyrrell; Creecy, James P; Maddox, Scott M; Grissom, Joe E; Conkle, Trevor L; Shadid, Tyler M; Teramoto, Jun; San Miguel, Phillip; Shimada, Tomohiro; Ishihama, Akira; Mori, Hirotada; Wanner, Barry L

    2014-07-08

    We analyzed the transcriptome of Escherichia coli K-12 by strand-specific RNA sequencing at single-nucleotide resolution during steady-state (logarithmic-phase) growth and upon entry into stationary phase in glucose minimal medium. To generate high-resolution transcriptome maps, we developed an organizational schema which showed that in practice only three features are required to define operon architecture: the promoter, terminator, and deep RNA sequence read coverage. We precisely annotated 2,122 promoters and 1,774 terminators, defining 1,510 operons with an average of 1.98 genes per operon. Our analyses revealed an unprecedented view of E. coli operon architecture. A large proportion (36%) of operons are complex with internal promoters or terminators that generate multiple transcription units. For 43% of operons, we observed differential expression of polycistronic genes, despite being in the same operons, indicating that E. coli operon architecture allows fine-tuning of gene expression. We found that 276 of 370 convergent operons terminate inefficiently, generating complementary 3' transcript ends which overlap on average by 286 nucleotides, and 136 of 388 divergent operons have promoters arranged such that their 5' ends overlap on average by 168 nucleotides. We found 89 antisense transcripts of 397-nucleotide average length, 7 unannotated transcripts within intergenic regions, and 18 sense transcripts that completely overlap operons on the opposite strand. Of 519 overlapping transcripts, 75% correspond to sequences that are highly conserved in E. coli (>50 genomes). Our data extend recent studies showing unexpected transcriptome complexity in several bacteria and suggest that antisense RNA regulation is widespread. Importance: We precisely mapped the 5' and 3' ends of RNA transcripts across the E. coli K-12 genome by using a single-nucleotide analytical approach. Our resulting high-resolution transcriptome maps show that ca. one-third of E. coli operons are

  1. An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile.

    Science.gov (United States)

    Prakash, Celine; Haeseler, Arndt Von

    2017-03-01

    RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.

  2. Endoplasmic reticulum stress-responsive transcription factor ATF6α directs recruitment of the Mediator of RNA polymerase II transcription and multiple histone acetyltransferase complexes.

    Science.gov (United States)

    Sela, Dotan; Chen, Lu; Martin-Brown, Skylar; Washburn, Michael P; Florens, Laurence; Conaway, Joan Weliky; Conaway, Ronald C

    2012-06-29

    The basic leucine zipper transcription factor ATF6α functions as a master regulator of endoplasmic reticulum (ER) stress response genes. Previous studies have established that, in response to ER stress, ATF6α translocates to the nucleus and activates transcription of ER stress response genes upon binding sequence specifically to ER stress response enhancer elements in their promoters. In this study, we investigate the biochemical mechanism by which ATF6α activates transcription. By exploiting a combination of biochemical and multidimensional protein identification technology-based mass spectrometry approaches, we have obtained evidence that ATF6α functions at least in part by recruiting to the ER stress response enhancer elements of ER stress response genes a collection of RNA polymerase II coregulatory complexes, including the Mediator and multiple histone acetyltransferase complexes, among which are the Spt-Ada-Gcn5 acetyltransferase (SAGA) and Ada-Two-A-containing (ATAC) complexes. Our findings shed new light on the mechanism of action of ATF6α, and they outline a straightforward strategy for applying multidimensional protein identification technology mass spectrometry to determine which RNA polymerase II transcription factors and coregulators are recruited to promoters and other regulatory elements to control transcription.

  3. Influenza Virus Mounts a Two-Pronged Attack on Host RNA Polymerase II Transcription.

    Science.gov (United States)

    Bauer, David L V; Tellier, Michael; Martínez-Alonso, Mónica; Nojima, Takayuki; Proudfoot, Nick J; Murphy, Shona; Fodor, Ervin

    2018-05-15

    Influenza virus intimately associates with host RNA polymerase II (Pol II) and mRNA processing machinery. Here, we use mammalian native elongating transcript sequencing (mNET-seq) to examine Pol II behavior during viral infection. We show that influenza virus executes a two-pronged attack on host transcription. First, viral infection causes decreased Pol II gene occupancy downstream of transcription start sites. Second, virus-induced cellular stress leads to a catastrophic failure of Pol II termination at poly(A) sites, with transcription often continuing for tens of kilobases. Defective Pol II termination occurs independently of the ability of the viral NS1 protein to interfere with host mRNA processing. Instead, this termination defect is a common effect of diverse cellular stresses and underlies the production of previously reported downstream-of-gene transcripts (DoGs). Our work has implications for understanding not only host-virus interactions but also fundamental aspects of mammalian transcription. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  4. Designed Transcriptional Regulation in Mammalian Cells Based on TALE- and CRISPR/dCas9.

    Science.gov (United States)

    Lebar, Tina; Jerala, Roman

    2018-01-01

    Transcriptional regulation lies at the center of many cellular processes and is the result of cellular response to different external and internal signals. Control of transcription of selected genes enables an unprecedented access to shape the cellular response. While orthogonal transcription factors from bacteria, yeast, plants, or other cells have been used to introduce new cellular logic into mammalian cells, the discovery of designable modular DNA binding domains, such as Transcription Activator-Like Effectors (TALEs) and the CRISPR system, enable targeting of almost any selected DNA sequence. Fusion or conditional association of DNA targeting domain with transcriptional effector domains enables controlled regulation of almost any endogenous or ectopic gene. Moreover, the designed regulators can be linked into genetic circuits to implement complex responses, such as different types of Boolean functions and switches. In this chapter, we describe the protocols for achieving efficient transcriptional regulation with TALE- and CRISPR-based designed transcription factors in mammalian cells.

  5. Molecular Evolution of the non-coding Eosinophil Granule Ontogeny Transcript EGOT

    Directory of Open Access Journals (Sweden)

    Dominic eRose

    2011-10-01

    Full Text Available Eukaryotic genomes are pervasively transcribed. A large fraction of the transcriptional output consists of long, mRNA-like, non-protein-coding transcripts (mlncRNAs. The evolutionary history of mlncRNAs is still largely uncharted territory.In this contribution, we explore in detail the evolutionary traces of the eosinophil granule ontogeny transcript (EGOT, an experimentally confirmed representative of an abundant class of totally intronic non-coding transcripts (TINs. EGOT is located antisense to an intron of the ITPR1 gene. We computationally identify putative EGOT orthologs in the genomes of 32 different amniotes, including orthologs from primates, rodents, ungulates, carnivores, afrotherians, and xenarthrans, as well as putative candidates from basal amniotes, such as opossum or platypus. We investigate the EGOT gene phylogeny, analyse patterns of sequence conservation, and the evolutionary conservation of the EGOT gene structure. We show that EGO-B, the spliced isoform, may be present throughout the placental mammals, but most likely dates back even further. We demonstrat here for the first time that the whole EGOT locus is highly structured, containing several evolutionary conserved and thermodynamic stable secondary structures.Our analyses allow us to postulate novel functional roles of a hitherto poorly understood region at the intron of EGO-B which is highly conserved at the sequence level. The region contains a novel ITPR1 exon and also conserved RNA secondary structures together with a conserved TATA-like element, which putatively acts as a promoter of an independent regulatory element.

  6. Transcription elongation. Heterogeneous tracking of RNA polymerase and its biological implications.

    Science.gov (United States)

    Imashimizu, Masahiko; Shimamoto, Nobuo; Oshima, Taku; Kashlev, Mikhail

    2014-01-01

    Regulation of transcription elongation via pausing of RNA polymerase has multiple physiological roles. The pausing mechanism depends on the sequence heterogeneity of the DNA being transcribed, as well as on certain interactions of polymerase with specific DNA sequences. In order to describe the mechanism of regulation, we introduce the concept of heterogeneity into the previously proposed alternative models of elongation, power stroke and Brownian ratchet. We also discuss molecular origins and physiological significances of the heterogeneity.

  7. Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

    NARCIS (Netherlands)

    Danisman, S.; Dijk, van A.D.J.; Bimbo, A.; Wal, van der F.; Hennig, L.; Folter, de S.; Angenent, G.C.; Immink, R.G.H.

    2013-01-01

    Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and ROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by

  8. Transcriptome Profiling Using Single-Molecule Direct RNA Sequencing Approach for In-depth Understanding of Genes in Secondary Metabolism Pathways of Camellia sinensis

    Directory of Open Access Journals (Sweden)

    Qingshan Xu

    2017-07-01

    Full Text Available Characteristic secondary metabolites, including flavonoids, theanine and caffeine, are important components of Camellia sinensis, and their biosynthesis has attracted widespread interest. Previous studies on the biosynthesis of these major secondary metabolites using next-generation sequencing technologies limited the accurately prediction of full-length (FL splice isoforms. Herein, we applied single-molecule sequencing to pooled tea plant tissues, to provide a more complete transcriptome of C. sinensis. Moreover, we identified 94 FL transcripts and four alternative splicing events for enzyme-coding genes involved in the biosynthesis of flavonoids, theanine and caffeine. According to the comparison between long-read isoforms and assemble transcripts, we improved the quality and accuracy of genes sequenced by short-read next-generation sequencing technology. The resulting FL transcripts, together with the improved assembled transcripts and identified alternative splicing events, enhance our understanding of genes involved in the biosynthesis of characteristic secondary metabolites in C. sinensis.

  9. Transcriptional regulation of human RANK ligand gene expression by E2F1

    International Nuclear Information System (INIS)

    Hu Yan; Sun Meng; Nadiminty, Nagalakshmi; Lou Wei; Pinder, Elaine; Gao, Allen C.

    2008-01-01

    Receptor activator of nuclear factor kappa B ligand (RANKL) is a critical osteoclastogenic factor involved in the regulation of bone resorption, immune function, the development of mammary gland and cardiovascular system. To understand the transcriptional regulation of RANKL, we amplified and characterized a 1890 bp 5'-flanking sequence of human RANKL gene (-1782 bp to +108 bp relative to the transcription start site). Using a series of deletion mutations of the 1890 bp RANKL promoter, we identified a 72 bp region (-172 to -100 bp) mediating RANKL basal transcriptional activity. Sequence analysis revealed a putative E2F binding site within this 72 bp region in the human RANKL promoter. Overexpression of E2F1 increased RANKL promoter activity, while down-regulation of E2F1 expression by small interfering RNA decreased RANKL promoter activity. RT-PCR and enzyme linked immunosorbent assays (ELISA) further demonstrated that E2F1 induced the expression of RANKL. Electrophoretic gel mobility shift assays (EMSA) and antibody competition assays confirmed that E2F1 proteins bind to the consensus E2F binding site in the RANKL promoter. Mutation of the E2F consensus binding site in the RANKL promoter profoundly reduced the basal promoter activity and abolished the transcriptional modulation of RANKL by E2F1. These results suggest that E2F1 plays an important role in regulating RANKL transcription through binding to the E2F consensus binding site

  10. Transcription Profiling of Bacillus subtilis Cells Infected with AR9, a Giant Phage Encoding Two Multisubunit RNA Polymerases.

    Science.gov (United States)

    Lavysh, Daria; Sokolova, Maria; Slashcheva, Marina; Förstner, Konrad U; Severinov, Konstantin

    2017-02-14

    Bacteriophage AR9 is a recently sequenced jumbo phage that encodes two multisubunit RNA polymerases. Here we investigated the AR9 transcription strategy and the effect of AR9 infection on the transcription of its host, Bacillus subtilis Analysis of whole-genome transcription revealed early, late, and continuously expressed AR9 genes. Alignment of sequences upstream of the 5' ends of AR9 transcripts revealed consensus sequences that define early and late phage promoters. Continuously expressed AR9 genes have both early and late promoters in front of them. Early AR9 transcription is independent of protein synthesis and must be determined by virion RNA polymerase injected together with viral DNA. During infection, the overall amount of host mRNAs is significantly decreased. Analysis of relative amounts of host transcripts revealed notable differences in the levels of some mRNAs. The physiological significance of up- or downregulation of host genes for AR9 phage infection remains to be established. AR9 infection is significantly affected by rifampin, an inhibitor of host RNA polymerase transcription. The effect is likely caused by the antibiotic-induced killing of host cells, while phage genome transcription is solely performed by viral RNA polymerases. IMPORTANCE Phages regulate the timing of the expression of their own genes to coordinate processes in the infected cell and maximize the release of viral progeny. Phages also alter the levels of host transcripts. Here we present the results of a temporal analysis of the host and viral transcriptomes of Bacillus subtilis infected with a giant phage, AR9. We identify viral promoters recognized by two virus-encoded RNA polymerases that are a unique feature of the phiKZ-related group of phages to which AR9 belongs. Our results set the stage for future analyses of highly unusual RNA polymerases encoded by AR9 and other phiKZ-related phages. Copyright © 2017 Lavysh et al.

  11. Computational design of RNA parts, devices, and transcripts with kinetic folding algorithms implemented on multiprocessor clusters.

    Science.gov (United States)

    Thimmaiah, Tim; Voje, William E; Carothers, James M

    2015-01-01

    With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.

  12. Transcriptome discovery in non-model wild fish species for the development of quantitative transcript abundance assays

    Science.gov (United States)

    Hahn, Cassidy M.; Iwanowicz, Luke R.; Cornman, Robert S.; Mazik, Patricia M.; Blazer, Vicki S.

    2016-01-01

    Environmental studies increasingly identify the presence of both contaminants of emerging concern (CECs) and legacy contaminants in aquatic environments; however, the biological effects of these compounds on resident fishes remain largely unknown. High throughput methodologies were employed to establish partial transcriptomes for three wild-caught, non-model fish species; smallmouth bass (Micropterus dolomieu), white sucker (Catostomus commersonii) and brown bullhead (Ameiurus nebulosus). Sequences from these transcriptome databases were utilized in the development of a custom nCounter CodeSet that allowed for direct multiplexed measurement of 50 transcript abundance endpoints in liver tissue. Sequence information was also utilized in the development of quantitative real-time PCR (qPCR) primers. Cross-species hybridization allowed the smallmouth bass nCounter CodeSet to be used for quantitative transcript abundance analysis of an additional non-model species, largemouth bass (Micropterus salmoides). We validated the nCounter analysis data system with qPCR for a subset of genes and confirmed concordant results. Changes in transcript abundance biomarkers between sexes and seasons were evaluated to provide baseline data on transcript modulation for each species of interest.

  13. LexA Binds to Transcription Regulatory Site of Cell Division Gene ftsZ in Toxic Cyanobacterium Microcystis aeruginosa.

    Science.gov (United States)

    Honda, Takashi; Morimoto, Daichi; Sako, Yoshihiko; Yoshida, Takashi

    2018-05-17

    Previously, we showed that DNA replication and cell division in toxic cyanobacterium Microcystis aeruginosa are coordinated by transcriptional regulation of cell division gene ftsZ and that an unknown protein specifically bound upstream of ftsZ (BpFz; DNA-binding protein to an upstream site of ftsZ) during successful DNA replication and cell division. Here, we purified BpFz from M. aeruginosa strain NIES-298 using DNA-affinity chromatography and gel-slicing combined with gel electrophoresis mobility shift assay (EMSA). The N-terminal amino acid sequence of BpFz was identified as TNLESLTQ, which was identical to that of transcription repressor LexA from NIES-843. EMSA analysis using mutant probes showed that the sequence GTACTAN 3 GTGTTC was important in LexA binding. Comparison of the upstream regions of lexA in the genomes of closely related cyanobacteria suggested that the sequence TASTRNNNNTGTWC could be a putative LexA recognition sequence (LexA box). Searches for TASTRNNNNTGTWC as a transcriptional regulatory site (TRS) in the genome of M. aeruginosa NIES-843 showed that it was present in genes involved in cell division, photosynthesis, and extracellular polysaccharide biosynthesis. Considering that BpFz binds to the TRS of ftsZ during normal cell division, LexA may function as a transcriptional activator of genes related to cell reproduction in M. aeruginosa, including ftsZ. This may be an example of informality in the control of bacterial cell division.

  14. Analysis artefacts of the INS-IGF2 fusion transcript

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Frogne, Thomas; Rescan, Claude

    2015-01-01

    Background: In gene expression analysis, overlapping genes, splice variants, and fusion transcripts are potential sources of data analysis artefacts, depending on how the observed intensity is assigned to one, or more genes. We here exemplify this by an in-depth analysis of the INS-IGF2 fusion...... transcript, which has recently been reported to be among the highest expressed transcripts in human pancreatic beta cells and its protein indicated as a novel autoantigen in Type 1 Diabetes. Results: Through RNA sequencing and variant specific qPCR analyses we demonstrate that the true abundance of INS-IGF2...... is >20,000 fold lower than INS in human beta cells, and we suggest an explanation to the nature of the artefacts which have previously led to overestimation of the gene expression level in selected studies. We reinvestigated the previous reported findings of detection of INS-IGF2 using antibodies both...

  15. Herpes simplex virus latency-associated transcript sequence downstream of the promoter influences type-specific reactivation and viral neurotropism.

    Science.gov (United States)

    Bertke, Andrea S; Patel, Amita; Krause, Philip R

    2007-06-01

    Herpes simplex virus (HSV) establishes latency in sensory nerve ganglia during acute infection and may later periodically reactivate to cause recurrent disease. HSV type 1 (HSV-1) reactivates more efficiently than HSV-2 from trigeminal ganglia while HSV-2 reactivates more efficiently than HSV-1 from lumbosacral dorsal root ganglia (DRG) to cause recurrent orofacial and genital herpes, respectively. In a previous study, a chimeric HSV-2 that expressed the latency-associated transcript (LAT) from HSV-1 reactivated similarly to wild-type HSV-1, suggesting that the LAT influences the type-specific reactivation phenotype of HSV-2. To further define the LAT region essential for type-specific reactivation, we constructed additional chimeric HSV-2 viruses by replacing the HSV-2 LAT promoter (HSV2-LAT-P1) or 2.5 kb of the HSV-2 LAT sequence (HSV2-LAT-S1) with the corresponding regions from HSV-1. HSV2-LAT-S1 was impaired for reactivation in the guinea pig genital model, while its rescuant and HSV2-LAT-P1 reactivated with a wild-type HSV-2 phenotype. Moreover, recurrences of HSV-2-LAT-S1 were frequently fatal, in contrast to the relatively mild recurrences of the other viruses. During recurrences, HSV2-LAT-S1 DNA increased more in the sacral cord compared to its rescuant or HSV-2. Thus, the LAT sequence region, not the LAT promoter region, provides essential elements for type-specific reactivation of HSV-2 and also plays a role in viral neurotropism. HSV-1 DNA, as quantified by real-time PCR, was more abundant in the lumbar spinal cord, while HSV-2 DNA was more abundant in the sacral spinal cord, which may provide insights into the mechanism for type-specific reactivation and different patterns of central nervous system infection of HSV-1 and HSV-2.

  16. Characterizing leader sequences of CRISPR loci

    DEFF Research Database (Denmark)

    Alkhnbashi, Omer; Shah, Shiraz Ali; Garrett, Roger Antony

    2016-01-01

    The CRISPR-Cas system is an adaptive immune system in many archaea and bacteria, which provides resistance against invading genetic elements. The first phase of CRISPR-Cas immunity is called adaptation, in which small DNA fragments are excised from genetic elements and are inserted into a CRISPR...... array generally adjacent to its so called leader sequence at one end of the array. It has been shown that transcription initiation and adaptation signals of the CRISPR array are located within the leader. However, apart from promoters, there is very little knowledge of sequence or structural motifs...... sequences by focusing on the consensus repeat of the adjacent CRISPR array and weak upstream conservation signals. We applied our tool to the analysis of a comprehensive genomic database and identified several characteristic properties of leader sequences specific to archaea and bacteria, ranging from...

  17. Stimulation of albumin gene transcription by insulin in primary cultures of rat hepatocytes

    International Nuclear Information System (INIS)

    Lloyd, C.E.; Kalinyak, J.E.; Hutson, S.M.; Jefferson, L.S.

    1987-01-01

    The first goal of the work reported here was to prepare single-stranded DNA sequences for use in studies on the regulation of albumin gene expression. A double-stranded rat albumin cDNA clone was subcloned into the bacteriophage vector M13mp7. Single-stranded recombinant clones were screened for albumin sequences containing either the mRNA strand or the complementary strand. Two clones were selected that contained the 1200 nucleotide long 3' end of the albumin sequence. DNA from the clone containing the mRNA strand was used as a template for DNA polymerase I to prepare a radiolabeled, single-stranded cDNA to albumin mRNA. This radiolabeled cDNA probe was used to quantitate the relative abundance of albumin mRNA in samples of total cellular RNA. DNA from the clone containing the complementary strand was used to measure relative rates of albumin gene transcription in isolated nuclei. The second goal was to use the single-stranded DNA probes to investigate the mechanism of the insulin-mediated stimulation of albumin synthesis in primary cultures of rat hepatocytes. Addition of insulin to hepatocytes maintained in a chemically defined, serum-free medium for 40 h in the absence of any hormones resulted in a specific 1.5- to 2.5-fold stimulation of albumin gene transcription that was maximal at 3 h and was maintained above control values for at least 24 h. The rate of albumin gene transcription in nuclei isolated from livers of diabetic rats was reduced to 50% of the value recorded in control nuclei. Taken together, these findings demonstrate that insulin regulates synthesis of albumin at the level of gene transcription

  18. Temporal transcription of the lactococcal temperate phage TP901-1 and DNA sequence of the early promoter region

    DEFF Research Database (Denmark)

    Madsen, Hans Peter Lynge; Hammer, Karin

    1998-01-01

    to a phage repressor, a single-stranded DNA-binding protein, a topoisomerase, a Cro-like protein and two other phage proteins of unknown function were detected. The gene arrangement in the early transcribed region of TP901-1 thus consists of two transcriptional units: one from PR containing four genes......, of which at least two (the integrase gene and putative repressor) are needed for lysogeny, and the divergent and longer transcriptional unit from PL, presumably encoding functions required for the lytic life cycle. ORFs with homology to proteins involved in DNA replication were identified on the latter......Transcriptional analysis by Northern blotting identified clusters of early, middle and late transcribed regions of the temperate lactococcal bacteriophage TP901-1 during one-step growth experiments. The latent period was found to be 65 min and the burst size 40 +/- 10. The eight early transcripts...

  19. Eukaryotic transcription factors

    DEFF Research Database (Denmark)

    Staby, Lasse; O'Shea, Charlotte; Willemoës, Martin

    2017-01-01

    Gene-specific transcription factors (TFs) are key regulatory components of signaling pathways, controlling, for example, cell growth, development, and stress responses. Their biological functions are determined by their molecular structures, as exemplified by their structured DNA-binding domains...... regions with function-related, short sequence motifs and molecular recognition features with structural propensities. This review focuses on molecular aspects of TFs, which represent paradigms of ID-related features. Through specific examples, we review how the ID-associated flexibility of TFs enables....... It is furthermore emphasized how classic biochemical concepts like allostery, conformational selection, induced fit, and feedback regulation are undergoing a revival with the appreciation of ID. The review also describes the most recent advances based on computational simulations of ID-based interaction mechanisms...

  20. Elements in the transcriptional regulatory region flanking herpes simplex virus type 1 oriS stimulate origin function.

    Science.gov (United States)

    Wong, S W; Schaffer, P A

    1991-05-01

    Like other DNA-containing viruses, the three origins of herpes simplex virus type 1 (HSV-1) DNA replication are flanked by sequences containing transcriptional regulatory elements. In a transient plasmid replication assay, deletion of sequences comprising the transcriptional regulatory elements of ICP4 and ICP22/47, which flank oriS, resulted in a greater than 80-fold decrease in origin function compared with a plasmid, pOS-822, which retains these sequences. In an effort to identify specific cis-acting elements responsible for this effect, we conducted systematic deletion analysis of the flanking region with plasmid pOS-822 and tested the resulting mutant plasmids for origin function. Stimulation by cis-acting elements was shown to be both distance and orientation dependent, as changes in either parameter resulted in a decrease in oriS function. Additional evidence for the stimulatory effect of flanking sequences on origin function was demonstrated by replacement of these sequences with the cytomegalovirus immediate-early promoter, resulting in nearly wild-type levels of oriS function. In competition experiments, cotransfection of cells with the test plasmid, pOS-822, and increasing molar concentrations of a competitor plasmid which contained the ICP4 and ICP22/47 transcriptional regulatory regions but lacked core origin sequences resulted in a significant reduction in the replication efficiency of pOS-822, demonstrating that factors which bind specifically to the oriS-flanking sequences are likely involved as auxiliary proteins in oriS function. Together, these studies demonstrate that trans-acting factors and the sites to which they bind play a critical role in the efficiency of HSV-1 DNA replication from oriS in transient-replication assays.

  1. The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line

    DEFF Research Database (Denmark)

    Suzuki, Harukazu; Forrest, Alistair R R; van Nimwegen, Erik

    2009-01-01

    , we identified the key transcription regulators, their time-dependent activities and target genes. Systematic siRNA knockdown of 52 transcription factors confirmed the roles of individual factors in the regulatory network. Our results indicate that cellular states are constrained by complex networks......Using deep sequencing (deepCAGE), the FANTOM4 study measured the genome-wide dynamics of transcription-start-site usage in the human monocytic cell line THP-1 throughout a time course of growth arrest and differentiation. Modeling the expression dynamics in terms of predicted cis-regulatory sites...... involving both positive and negative regulatory interactions among substantial numbers of transcription factors and that no single transcription factor is both necessary and sufficient to drive the differentiation process....

  2. Direct non transcriptional role of NF-Y in DNA replication.

    Science.gov (United States)

    Benatti, Paolo; Belluti, Silvia; Miotto, Benoit; Neusiedler, Julia; Dolfini, Diletta; Drac, Marjorie; Basile, Valentina; Schwob, Etienne; Mantovani, Roberto; Blow, J Julian; Imbriano, Carol

    2016-04-01

    NF-Y is a heterotrimeric transcription factor, which plays a pioneer role in the transcriptional control of promoters containing the CCAAT-box, among which genes involved in cell cycle regulation, apoptosis and DNA damage response. The knock-down of the sequence-specific subunit NF-YA triggers defects in S-phase progression, which lead to apoptotic cell death. Here, we report that NF-Y has a critical function in DNA replication progression, independent from its transcriptional activity. NF-YA colocalizes with early DNA replication factories, its depletion affects the loading of replisome proteins to DNA, among which Cdc45, and delays the passage from early to middle-late S phase. Molecular combing experiments are consistent with a role for NF-Y in the control of fork progression. Finally, we unambiguously demonstrate a direct non-transcriptional role of NF-Y in the overall efficiency of DNA replication, specifically in the DNA elongation process, using a Xenopus cell-free system. Our findings broaden the activity of NF-Y on a DNA metabolism other than transcription, supporting the existence of specific TFs required for proper and efficient DNA replication. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  3. Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis, and Open Problems.

    Science.gov (United States)

    Wei, Yingying; Wu, George; Ji, Hongkai

    2013-05-01

    Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.

  4. A novel mode for transcription inhibition mediated by PNA-induced R-loops with a model in vitro system.

    Science.gov (United States)

    D'Souza, Alicia D; Belotserkovskii, Boris P; Hanawalt, Philip C

    2018-02-01

    The selective inhibition of transcription of a chosen gene by an artificial agent has numerous applications. Usually, these agents are designed to bind a specific nucleotide sequence in the promoter or within the transcribed region of the chosen gene. However, since optimal binding sites might not exist within the gene, it is of interest to explore the possibility of transcription inhibition when the agent is designed to bind at other locations. One of these possibilities arises when an additional transcription initiation site (e.g. secondary promoter) is present upstream from the primary promoter of the target gene. In this case, transcription inhibition might be achieved by inducing the formation of an RNA-DNA hybrid (R-loop) upon transcription from the secondary promoter. The R-loop could extend into the region of the primary promoter, to interfere with promoter recognition by RNA polymerase and thereby inhibit transcription. As a sequence-specific R-loop-inducing agent, a peptide nucleic acid (PNA) could be designed to facilitate R-loop formation by sequestering the non-template DNA strand. To investigate this mode for transcription inhibition, we have employed a model system in which a PNA binding site is localized between the T3 and T7 phage RNA polymerase promoters, which respectively assume the roles of primary and secondary promoters. In accord with our model, we have demonstrated that with PNA-bound DNA substrates, transcription from the T7 promoter reduces transcription from the T3 promoter by 30-fold, while in the absence of PNA binding there is no significant effect of T7 transcription upon T3 transcription. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Differences in transcription between free-living and CO2-activated third-stage larvae of Haemonchus contortus

    Directory of Open Access Journals (Sweden)

    Zhong Weiwei

    2010-04-01

    Full Text Available Abstract Background The disease caused by Haemonchus contortus, a blood-feeding nematode of small ruminants, is of major economic importance worldwide. The infective third-stage larva (L3 of this gastric nematode is enclosed in a cuticle (sheath and, once ingested with herbage by the host, undergoes an exsheathment process that marks the transition from the free-living (L3 to the parasitic (xL3 stage. This study explored changes in gene transcription associated with this transition and predicted, based on comparative analysis, functional roles for key transcripts in the metabolic pathways linked to larval development. Results Totals of 101,305 (L3 and 105,553 (xL3 expressed sequence tags (ESTs were determined using 454 sequencing technology, and then assembled and annotated; the most abundant transcripts encoded transthyretin-like, calcium-binding EF-hand, NAD(P-binding and nucleotide-binding proteins as well as homologues of Ancylostoma-secreted proteins (ASPs. Using an in silico-subtractive analysis, 560 and 685 sequences were shown to be uniquely represented in the L3 and xL3 stages, respectively; the transcripts encoded ribosomal proteins, collagens and elongation factors (in L3, and mainly peptidases and other enzymes of amino acid catabolism (in xL3. Caenorhabditis elegans orthologues of transcripts that were uniquely transcribed in each L3 and xL3 were predicted to interact with a total of 535 other genes, all of which were involved in embryonic development. Conclusion The present study indicated that some key transcriptional alterations taking place during the transition from the L3 to the xL3 stage of H. contortus involve genes predicted to be linked to the development of neuronal tissue (L3 and xL3, formation of the cuticle (L3 and digestion of host haemoglobin (xL3. Future efforts using next-generation sequencing and bioinformatic technologies should provide the efficiency and depth of coverage required for the determination of the

  6. Expression sequence tag library derived from peripheral blood mononuclear cells of the chlorocebus sabaeus

    Directory of Open Access Journals (Sweden)

    Tchitchek Nicolas

    2012-06-01

    Full Text Available Abstract Background African Green Monkeys (AGM are amongst the most frequently used nonhuman primate models in clinical and biomedical research, nevertheless only few genomic resources exist for this species. Such information would be essential for the development of dedicated new generation technologies in fundamental and pre-clinical research using this model, and would deliver new insights into primate evolution. Results We have exhaustively sequenced an Expression Sequence Tag (EST library made from a pool of Peripheral Blood Mononuclear Cells from sixteen Chlorocebus sabaeus monkeys. Twelve of them were infected with the Simian Immunodeficiency Virus. The mononuclear cells were or not stimulated in vitro with Concanavalin A, with lipopolysacharrides, or through mixed lymphocyte reaction in order to generate a representative and broad library of expressed sequences in immune cells. We report here 37,787 sequences, which were assembled into 14,410 contigs representing an estimated 12% of the C. sabaeus transcriptome. Using data from primate genome databases, 9,029 assembled sequences from C. sabaeus could be annotated. Sequences have been systematically aligned with ten cDNA references of primate species including Homo sapiens, Pan troglodytes, and Macaca mulatta to identify ortholog transcripts. For 506 transcripts, sequences were quasi-complete. In addition, 6,576 transcript fragments are potentially specific to the C. sabaeus or corresponding to not yet described primate genes. Conclusions The EST library we provide here will prove useful in gene annotation efforts for future sequencing of the African Green Monkey genomes. Furthermore, this library, which particularly well represents immunological and hematological gene expression, will be an important resource for the comparative analysis of gene expression in clinically relevant nonhuman primate and human research.

  7. Partial characterization of three β-defensin gene transcripts in river ...

    African Journals Online (AJOL)

    In this study, the tracheal tissues from Egyptian river buffalo and cattle were screened for the presence of three bovine β-defensin gene transcripts. Three primer pairs were designed on the basis of published Bos taurus sequences for partial amplification of β-defensin 4, β-defensin 10 and β-defensin 11 complementary DNA ...

  8. The Enzyme-Like Domain of Arabidopsis Nuclear β-Amylases Is Critical for DNA Sequence Recognition and Transcriptional Activation.

    Science.gov (United States)

    Soyk, Sebastian; Simková, Klára; Zürcher, Evelyne; Luginbühl, Leonie; Brand, Luise H; Vaughan, Cara K; Wanke, Dierk; Zeeman, Samuel C

    2014-04-01

    Plant BZR1-BAM transcription factors contain a β-amylase (BAM)-like domain, characteristic of proteins involved in starch breakdown. The enzyme-derived domains appear to be noncatalytic, but they determine the function of the two Arabidopsis thaliana BZR1-BAM isoforms (BAM7 and BAM8) during transcriptional initiation. Removal or swapping of the BAM domains demonstrates that the BAM7 BAM domain restricts DNA binding and transcriptional activation, while the BAM8 BAM domain allows both activities. Furthermore, we demonstrate that BAM7 and BAM8 interact on the protein level and cooperate during transcriptional regulation. Site-directed mutagenesis of residues in the BAM domain of BAM8 shows that its function as a transcriptional activator is independent of catalysis but requires an intact substrate binding site, suggesting it may bind a ligand. Microarray experiments with plants overexpressing truncated versions lacking the BAM domain indicate that the pseudo-enzymatic domain increases selectivity for the preferred cis-regulatory element BBRE (BZR1-BAM Responsive Element). Side specificity toward the G-box may allow crosstalk to other signaling networks. This work highlights the importance of the enzyme-derived domain of BZR1-BAMs, supporting their potential role as metabolic sensors. © 2014 American Society of Plant Biologists. All rights reserved.

  9. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    Energy Technology Data Exchange (ETDEWEB)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.; Salzberg, Steven L.; Rubin, Gerald M.; Eisen, Michael B.; Celniker, SusanE.

    2004-08-06

    The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.

  10. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  11. Transcriptome analysis of blueberry using 454 EST sequencing

    Science.gov (United States)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  12. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing

    KAUST Repository

    Zhang, Runxuan

    2017-04-05

    Alternative splicing generates multiple transcript and protein isoforms from the same gene and thus is important in gene expression regulation. To date, RNA-sequencing (RNA-seq) is the standard method for quantifying changes in alternative splicing on a genome-wide scale. Understanding the current limitations of RNA-seq is crucial for reliable analysis and the lack of high quality, comprehensive transcriptomes for most species, including model organisms such as Arabidopsis, is a major constraint in accurate quantification of transcript isoforms. To address this, we designed a novel pipeline with stringent filters and assembled a comprehensive Reference Transcript Dataset for Arabidopsis (AtRTD2) containing 82,190 non-redundant transcripts from 34 212 genes. Extensive experimental validation showed that AtRTD2 and its modified version, AtRTD2-QUASI, for use in Quantification of Alternatively Spliced Isoforms, outperform other available transcriptomes in RNA-seq analysis. This strategy can be implemented in other species to build a pipeline for transcript-level expression and alternative splicing analyses.

  13. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing

    KAUST Repository

    Zhang, Runxuan; Calixto, Cristiane  P.  G.; Marquez, Yamile; Venhuizen, Peter; Tzioutziou, Nikoleta A.; Guo, Wenbin; Spensley, Mark; Entizne, Juan Carlos; Lewandowska, Dominika; ten  Have, Sara; Frei  dit  Frey, Nicolas; Hirt, Heribert; James, Allan B.; Nimmo, Hugh G.; Barta, Andrea; Kalyna, Maria; Brown, John  W.  S.

    2017-01-01

    Alternative splicing generates multiple transcript and protein isoforms from the same gene and thus is important in gene expression regulation. To date, RNA-sequencing (RNA-seq) is the standard method for quantifying changes in alternative splicing on a genome-wide scale. Understanding the current limitations of RNA-seq is crucial for reliable analysis and the lack of high quality, comprehensive transcriptomes for most species, including model organisms such as Arabidopsis, is a major constraint in accurate quantification of transcript isoforms. To address this, we designed a novel pipeline with stringent filters and assembled a comprehensive Reference Transcript Dataset for Arabidopsis (AtRTD2) containing 82,190 non-redundant transcripts from 34 212 genes. Extensive experimental validation showed that AtRTD2 and its modified version, AtRTD2-QUASI, for use in Quantification of Alternatively Spliced Isoforms, outperform other available transcriptomes in RNA-seq analysis. This strategy can be implemented in other species to build a pipeline for transcript-level expression and alternative splicing analyses.

  14. Broadly reactive pan-paramyxovirus reverse transcription polymerase chain reaction and sequence analysis for the detection of Canine distemper virus in a case of canine meningoencephalitis of unknown etiology

    Science.gov (United States)

    Schatzberg, Scott J.; Li, Qiang; Porter, Brian F.; Barber, Renee M.; Claiborne, Mary Kate; Levine, Jonathan M.; Levine, Gwendolyn J.; Israel, Sarah K.; Young, Benjamin D.; Kiupel, Matti; Greene, Craig; Ruone, Susan; Anderson, Larry; Tong, Suxiang

    2016-01-01

    Despite the immunologic protection associated with routine vaccination protocols, Canine distemper virus (CDV) remains an important pathogen of dogs. Antemortem diagnosis of systemic CDV infection may be made by reverse transcription polymerase chain reaction (RT-PCR) and/or immunohistochemical testing for CDV antigen; central nervous system infection often requires postmortem confirmation via histopathology and immunohistochemistry. An 8-month-old intact male French Bulldog previously vaccinated for CDV presented with multifocal neurologic signs. Based on clinical and postmortem findings, the dog’s disease was categorized as a meningoencephalitis of unknown etiology. Broadly reactive, pan-paramyxovirus RT-PCR using consensus-degenerate hybrid oligonucleotide primers, combined with sequence analysis, identified CDV amplicons in the dog’s brain. Immunohistochemistry confirmed the presence of CDV antigens, and a specific CDV RT-PCR based on the phosphoprotein gene identified a wild-type versus vaccinal virus strain. This case illustrates the utility of broadly reactive PCR and sequence analysis for the identification of pathogens in diseases with unknown etiology. PMID:19901287

  15. Versatility of cooperative transcriptional activation: a thermodynamical modeling analysis for greater-than-additive and less-than-additive effects.

    Directory of Open Access Journals (Sweden)

    Till D Frank

    Full Text Available We derive a statistical model of transcriptional activation using equilibrium thermodynamics of chemical reactions. We examine to what extent this statistical model predicts synergy effects of cooperative activation of gene expression. We determine parameter domains in which greater-than-additive and less-than-additive effects are predicted for cooperative regulation by two activators. We show that the statistical approach can be used to identify different causes of synergistic greater-than-additive effects: nonlinearities of the thermostatistical transcriptional machinery and three-body interactions between RNA polymerase and two activators. In particular, our model-based analysis suggests that at low transcription factor concentrations cooperative activation cannot yield synergistic greater-than-additive effects, i.e., DNA transcription can only exhibit less-than-additive effects. Accordingly, transcriptional activity turns from synergistic greater-than-additive responses at relatively high transcription factor concentrations into less-than-additive responses at relatively low concentrations. In addition, two types of re-entrant phenomena are predicted. First, our analysis predicts that under particular circumstances transcriptional activity will feature a sequence of less-than-additive, greater-than-additive, and eventually less-than-additive effects when for fixed activator concentrations the regulatory impact of activators on the binding of RNA polymerase to the promoter increases from weak, to moderate, to strong. Second, for appropriate promoter conditions when activator concentrations are increased then the aforementioned re-entrant sequence of less-than-additive, greater-than-additive, and less-than-additive effects is predicted as well. Finally, our model-based analysis suggests that even for weak activators that individually induce only negligible increases in promoter activity, promoter activity can exhibit greater

  16. Regulation of endogenous human gene expression by ligand-inducible TALE transcription factors.

    Science.gov (United States)

    Mercer, Andrew C; Gaj, Thomas; Sirk, Shannon J; Lamb, Brian M; Barbas, Carlos F

    2014-10-17

    The construction of increasingly sophisticated synthetic biological circuits is dependent on the development of extensible tools capable of providing specific control of gene expression in eukaryotic cells. Here, we describe a new class of synthetic transcription factors that activate gene expression in response to extracellular chemical stimuli. These inducible activators consist of customizable transcription activator-like effector (TALE) proteins combined with steroid hormone receptor ligand-binding domains. We demonstrate that these ligand-responsive TALE transcription factors allow for tunable and conditional control of gene activation and can be used to regulate the expression of endogenous genes in human cells. Since TALEs can be designed to recognize any contiguous DNA sequence, the conditional gene regulatory system described herein will enable the design of advanced synthetic gene networks.

  17. Transcriptional Regulation During Zygotic Genome Activation in Zebrafish and Other Anamniote Embryos.

    Science.gov (United States)

    Wragg, J; Müller, F

    2016-01-01

    Embryo development commences with the fusion of two terminally differentiated haploid gametes into the totipotent fertilized egg, which through a series of major cellular and molecular transitions generate a pluripotent cell mass. The activation of the zygotic genome occurs during the so-called maternal to zygotic transition and prepares the embryo for zygotic takeover from maternal factors, in the control of the development of cellular lineages during differentiation. Recent advances in next generation sequencing technologies have allowed the dissection of the genomic and epigenomic processes mediating this transition. These processes include reorganization of the chromatin structure to a transcriptionally permissive state, changes in composition and function of structural and regulatory DNA-binding proteins, and changeover of the transcriptome as it is overhauled from that deposited by the mother in the oocyte to a zygotically transcribed complement. Zygotic genome activation in zebrafish occurs 10 cell cycles after fertilization and provides an ideal experimental platform for elucidating the temporal sequence and dynamics of establishment of a transcriptionally active chromatin state and helps in identifying the determinants of transcription activation at polymerase II transcribed gene promoters. The relatively large number of pluripotent cells generated by the fast cell divisions before zygotic transcription provides sufficient biomass for next generation sequencing technology approaches to establish the temporal dynamics of events and suggest causative relationship between them. However, genomic and genetic technologies need to be improved further to capture the earliest events in development, where cell number is a limiting factor. These technologies need to be complemented with precise, inducible genetic interference studies using the latest genome editing tools to reveal the function of candidate determinants and to confirm the predictions made by classic

  18. What makes ribosome-mediated transcriptional attenuation sensitive to amino acid limitation?

    Directory of Open Access Journals (Sweden)

    Johan Elf

    2005-06-01

    Full Text Available Ribosome-mediated transcriptional attenuation mechanisms are commonly used to control amino acid biosynthetic operons in bacteria. The mRNA leader of such an operon contains an open reading frame with "regulatory" codons, cognate to the amino acid that is synthesized by the enzymes encoded by the operon. When the amino acid is in short supply, translation of the regulatory codons is slow, which allows transcription to continue into the structural genes of the operon. When amino acid supply is in excess, translation of regulatory codons is rapid, which leads to termination of transcription. We use a discrete master equation approach to formulate a probabilistic model for the positioning of the RNA polymerase and the ribosome in the attenuator leader sequence. The model describes how the current rate of amino acid supply compared to the demand in protein synthesis (signal determines the expression of the amino acid biosynthetic operon (response. The focus of our analysis is on the sensitivity of operon expression to a change in the amino acid supply. We show that attenuation of transcription can be hyper-sensitive for two main reasons. The first is that its response depends on the outcome of a race between two multi-step mechanisms with synchronized starts: transcription of the leader of the operon, and translation of its regulatory codons. The relative change in the probability that transcription is aborted (attenuated can therefore be much larger than the relative change in the time it takes for the ribosome to read a regulatory codon. The second is that the general usage frequencies of codons of the type used in attenuation control are small. A small percentage decrease in the rate of supply of the controlled amino acid can therefore lead to a much larger percentage decrease in the rate of reading a regulatory codon. We show that high sensitivity further requires a particular choice of regulatory codon among several synonymous codons for the

  19. Transcriptome discovery in non-model wild fish species for the development of quantitative transcript abundance assays.

    Science.gov (United States)

    Hahn, Cassidy M; Iwanowicz, Luke R; Cornman, Robert S; Mazik, Patricia M; Blazer, Vicki S

    2016-12-01

    Environmental studies increasingly identify the presence of both contaminants of emerging concern (CECs) and legacy contaminants in aquatic environments; however, the biological effects of these compounds on resident fishes remain largely unknown. High throughput methodologies were employed to establish partial transcriptomes for three wild-caught, non-model fish species; smallmouth bass (Micropterus dolomieu), white sucker (Catostomus commersonii) and brown bullhead (Ameiurus nebulosus). Sequences from these transcriptome databases were utilized in the development of a custom nCounter CodeSet that allowed for direct multiplexed measurement of 50 transcript abundance endpoints in liver tissue. Sequence information was also utilized in the development of quantitative real-time PCR (qPCR) primers. Cross-species hybridization allowed the smallmouth bass nCounter CodeSet to be used for quantitative transcript abundance analysis of an additional non-model species, largemouth bass (Micropterus salmoides). We validated the nCounter analysis data system with qPCR for a subset of genes and confirmed concordant results. Changes in transcript abundance biomarkers between sexes and seasons were evaluated to provide baseline data on transcript modulation for each species of interest. Published by Elsevier Inc.

  20. Retrotransposon-centered analysis of piRNA targeting shows a shift from active to passive retrotransposon transcription in developing mouse testes

    Directory of Open Access Journals (Sweden)

    Mourier Tobias

    2011-09-01

    Full Text Available Abstract Background Piwi-associated RNAs (piRNAs bind transcripts from retrotransposable elements (RTE in mouse germline cells and seemingly act as guides for genomic methylation, thereby repressing the activity of RTEs. It is currently unknown if and how Piwi proteins distinguish RTE transcripts from other cellular RNAs. During germline development, the main target of piRNAs switch between different types of RTEs. Using the piRNA targeting of RTEs as an indicator of RTE activity, and considering the entire population of genomic RTE loci along with their age and location, this study aims at further elucidating the dynamics of RTE activity during mouse germline development. Results Due to the inherent sequence redundancy between RTE loci, assigning piRNA targeting to specific loci is problematic. This limits the analysis, although certain features of piRNA targeting of RTE loci are apparent. As expected, young RTEs display a much higher level of piRNA targeting than old RTEs. Further, irrespective of age, RTE loci near protein-coding coding genes are targeted to a greater extent than RTE loci far from genes. During development, a shift in piRNA targeting is observed, with a clear increase in the relative piRNA targeting of RTEs residing within boundaries of protein-coding gene transcripts. Conclusions Reanalyzing published piRNA sequences and taking into account the features of individual RTE loci provide novel insight into the activity of RTEs during development. The obtained results are consistent with some degree of proportionality between what transcripts become substrates for Piwi protein complexes and the level by which the transcripts are present in the cell. A transition from active transcription of RTEs to passive co-transcription of RTE sequences residing within protein-coding transcripts appears to take place in postnatal development. Hence, the previously reported increase in piRNA targeting of SINEs in postnatal testis development

  1. Tentative mapping of transcription-induced interchromosomal interaction using chimeric EST and mRNA data.

    Directory of Open Access Journals (Sweden)

    Per Unneberg

    Full Text Available Recent studies on chromosome conformation show that chromosomes colocalize in the nucleus, bringing together active genes in transcription factories. This spatial proximity of actively transcribing genes could provide a means for RNA interaction at the transcript level. We have screened public databases for chimeric EST and mRNA sequences with the intent of mapping transcription-induced interchromosomal interactions. We suggest that chimeric transcripts may be the result of close encounters of active genes, either as functional products or "noise" in the transcription process, and that they could be used as probes for chromosome interactions. We have found a total of 5,614 chimeric ESTs and 587 chimeric mRNAs that meet our selection criteria. Due to their higher quality, the mRNA findings are of particular interest and we hope that they may serve as food for thought for specialists in diverse areas of molecular biology.

  2. Anthocyanin biosynthesis in pears is regulated by a R2R3-MYB transcription factor PyMYB10.

    Science.gov (United States)

    Feng, Shouqian; Wang, Yanling; Yang, Song; Xu, Yuting; Chen, Xuesen

    2010-06-01

    Skin color is an important factor in pear breeding programs. The degree of red coloration is determined by the content and composition of anthocyanins. In plants, many MYB transcriptional factors are involved in regulating anthocyanin biosynthesis. In this study, a R2R3-MYB transcription factor gene, PyMYB10, was isolated from Asian pear (Pyrus pyrifolia) cv. 'Aoguan'. Sequence analysis suggested that the PyMYB10 gene was an ortholog of MdMYB10 gene, which regulates anthocyanin biosynthesis in red fleshed apple (Malus x domestica) cv. 'Red Field'. PyMYB10 was identified at the genomic level and had three exons, with its upstream sequence containing core sequences of cis-acting regulatory elements involved in light responsiveness. Fruit bagging showed that light could induce expression of PyMYB10 and anthocyanin biosynthesis. Quantitative real-time PCR revealed that PyMYB10 was predominantly expressed in pear skins, buds, and young leaves, and the level of transcription in buds was higher than in skin and young leaves. In ripening fruits, the transcription of PyMYB10 in the skin was positively correlated with genes in the anthocyanin pathway and with anthocyanin biosynthesis. In addition, the transcription of PyMYB10 and genes of anthocyanin biosynthesis were more abundant in red-skinned pear cultivars compared to blushed cultivars. Transgenic Arabidopsis plants overexpressing PyMYB10 exhibited ectopic pigmentation in immature seeds. The study suggested that PyMYB10 plays a role in regulating anthocyanin biosynthesis and the overexpression of PyMYB10 was sufficient to induce anthocyanin accumulation.

  3. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  4. Logo2PWM: a tool to convert sequence logo to position weight matrix.

    Science.gov (United States)

    Gao, Zhen; Liu, Lu; Ruan, Jianhua

    2017-10-03

    position weight matrix (PWM) and sequence logo are the most widely used representations of transcription factor binding site (TFBS) in biological sequences. Sequence logo - a graphical representation of PWM, has been widely used in scientific publications and reports, due to its easiness of human perception, rich information, and simple format. Different from sequence logo, PWM works great as a precise and compact digitalized form, which can be easily used by a variety of motif analysis software. There are a few available tools to generate sequence logos from PWM; however, no tool does the reverse. Such tool to convert sequence logo back to PWM is needed to scan a TFBS represented in logo format in a publication where the PWM is not provided or hard to be acquired. A major difficulty in developing such tool to convert sequence logo to PWM is to deal with the diversity of sequence logo images. We propose logo2PWM for reconstructing PWM from a large variety of sequence logo images. Evaluation results on over one thousand logos from three sources of different logo format show that the correlation between the reconstructed PWMs and the original PWMs are constantly high, where median correlation is greater than 0.97. Because of the high recognition accuracy, the easiness of usage, and, the availability of both web-based service and stand-alone application, we believe that logo2PWM can readily benefit the study of transcription by filling the gap between sequence logo and PWM.

  5. Genome-Wide Identification of the Target Genes of AP2-O, a Plasmodium AP2-Family Transcription Factor.

    Directory of Open Access Journals (Sweden)

    Izumi Kaneko

    2015-05-01

    Full Text Available Stage-specific transcription is a fundamental biological process in the life cycle of the Plasmodium parasite. Proteins containing the AP2 DNA-binding domain are responsible for stage-specific transcriptional regulation and belong to the only known family of transcription factors in Plasmodium parasites. Comprehensive identification of their target genes will advance our understanding of the molecular basis of stage-specific transcriptional regulation and stage-specific parasite development. AP2-O is an AP2 family transcription factor that is expressed in the mosquito midgut-invading stage, called the ookinete, and is essential for normal morphogenesis of this stage. In this study, we identified the genome-wide target genes of AP2-O by chromatin immunoprecipitation-sequencing and elucidate how this AP2 family transcription factor contributes to the formation of this motile stage. The analysis revealed that AP2-O binds specifically to the upstream genomic regions of more than 500 genes, suggesting that approximately 10% of the parasite genome is directly regulated by AP2-O. These genes are involved in distinct biological processes such as morphogenesis, locomotion, midgut penetration, protection against mosquito immunity and preparation for subsequent oocyst development. This direct and global regulation by AP2-O provides a model for gene regulation in Plasmodium parasites and may explain how these parasites manage to control their complex life cycle using a small number of sequence-specific AP2 transcription factors.

  6. De-novo discovery of differentially abundant transcription factor binding sites including their positional preference.

    Science.gov (United States)

    Keilwagen, Jens; Grau, Jan; Paponov, Ivan A; Posch, Stefan; Strickert, Marc; Grosse, Ivo

    2011-02-10

    Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open

  7. Genome wide analysis of stress responsive WRKY transcription factors in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Shaiq Sultan

    2016-04-01

    Full Text Available WRKY transcription factors are a class of DNA-binding proteins that bind with a specific sequence C/TTGACT/C known as W-Box found in promoters of genes which are regulated by these WRKYs. From previous studies, 43 different stress responsive WRKY transcription factors in Arabidopsis thaliana, identified and then categorized in three groups viz., abiotic, biotic and both of these stresses. A comprehensive genome wide analysis including chromosomal localization, gene structure analysis, multiple sequence alignment, phylogenetic analysis and promoter analysis of these WRKY genes was carried out in this study to determine the functional homology in Arabidopsis. This analysis led to the classification of these WRKY family members into 3 major groups and subgroups and showed evolutionary relationship among these groups on the base of their functional WRKY domain, chromosomal localization and intron/exon structure. The proposed groups of these stress responsive WRKY genes and annotation based on their position on chromosomes can also be explored to determine their functional homology in other plant species in relation to different stresses. The result of the present study provides indispensable genomic information for the stress responsive WRKY transcription factors in Arabidopsis and will pave the way to explain the precise role of various AtWRKYs in plant growth and development under stressed conditions.

  8. Environmental contaminants and microRNA regulation: Transcription factors as regulators of toxicant-altered microRNA expression

    Energy Technology Data Exchange (ETDEWEB)

    Sollome, James; Martin, Elizabeth [Department of Environmental Science & Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill (United States); Sethupathy, Praveen [Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NC (United States); Fry, Rebecca C., E-mail: rfry@unc.edu [Department of Environmental Science & Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill (United States); Curriculum in Toxicology, School of Medicine, University of North Carolina, Chapel Hill, NC (United States)

    2016-12-01

    MicroRNAs (miRNAs) regulate gene expression by binding mRNA and inhibiting translation and/or inducing degradation of the associated transcripts. Expression levels of miRNAs have been shown to be altered in response to environmental toxicants, thus impacting cellular function and influencing disease risk. Transcription factors (TFs) are known to be altered in response to environmental toxicants and play a critical role in the regulation of miRNA expression. To date, environmentally-responsive TFs that are important for regulating miRNAs remain understudied. In a state-of-the-art analysis, we utilized an in silico bioinformatic approach to characterize potential transcriptional regulators of environmentally-responsive miRNAs. Using the miRStart database, genomic sequences of promoter regions for all available human miRNAs (n = 847) were identified and promoter regions were defined as − 1000/+500 base pairs from the transcription start site. Subsequently, the promoter region sequences of environmentally-responsive miRNAs (n = 128) were analyzed using enrichment analysis to determine overrepresented TF binding sites (TFBS). While most (56/73) TFs differed across environmental contaminants, a set of 17 TFs was enriched for promoter binding among miRNAs responsive to numerous environmental contaminants. Of these, one TF was common to miRNAs altered by the majority of environmental contaminants, namely SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily A, member 3 (SMARCA3). These identified TFs represent candidate common transcriptional regulators of miRNAs perturbed by environmental toxicants. - Highlights: • Transcription factors that regulate environmentally-modulated miRNA expression are understudied • Transcription factor binding sites (TFBS) located within DNA promoter regions of miRNAs were identified. • Specific transcription factors may serve as master regulators of environmentally-mediated microRNA expression.

  9. Environmental contaminants and microRNA regulation: Transcription factors as regulators of toxicant-altered microRNA expression

    International Nuclear Information System (INIS)

    Sollome, James; Martin, Elizabeth; Sethupathy, Praveen; Fry, Rebecca C.

    2016-01-01

    MicroRNAs (miRNAs) regulate gene expression by binding mRNA and inhibiting translation and/or inducing degradation of the associated transcripts. Expression levels of miRNAs have been shown to be altered in response to environmental toxicants, thus impacting cellular function and influencing disease risk. Transcription factors (TFs) are known to be altered in response to environmental toxicants and play a critical role in the regulation of miRNA expression. To date, environmentally-responsive TFs that are important for regulating miRNAs remain understudied. In a state-of-the-art analysis, we utilized an in silico bioinformatic approach to characterize potential transcriptional regulators of environmentally-responsive miRNAs. Using the miRStart database, genomic sequences of promoter regions for all available human miRNAs (n = 847) were identified and promoter regions were defined as − 1000/+500 base pairs from the transcription start site. Subsequently, the promoter region sequences of environmentally-responsive miRNAs (n = 128) were analyzed using enrichment analysis to determine overrepresented TF binding sites (TFBS). While most (56/73) TFs differed across environmental contaminants, a set of 17 TFs was enriched for promoter binding among miRNAs responsive to numerous environmental contaminants. Of these, one TF was common to miRNAs altered by the majority of environmental contaminants, namely SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily A, member 3 (SMARCA3). These identified TFs represent candidate common transcriptional regulators of miRNAs perturbed by environmental toxicants. - Highlights: • Transcription factors that regulate environmentally-modulated miRNA expression are understudied • Transcription factor binding sites (TFBS) located within DNA promoter regions of miRNAs were identified. • Specific transcription factors may serve as master regulators of environmentally-mediated microRNA expression

  10. [Study on quality evaluation of sequence and SSR information in transcriptome of Astragalus membranacus].

    Science.gov (United States)

    Chang, Yue; Yang, Song; Liu, Zhen-Peng; Ren, Wei-Chao; Liu, Jie; Ma, Wei

    2016-04-01

    In this study, 454/Roche GS FLX sequencing technology was used to obtain the data of the Astragalus membranaceus. Four hundred and fifty-four Sequencing System Software was applied to carry out the transcription of the group from scratch. Using MISA tools, 9 893 unigenes were selected for the sequence of the genome of A. membranaceus, and the information of SSR locus was analyzed. According to the result, the average length of reads was 413 bp, about 86% of the reads was involved in the splicing, the length of the N50 was 1 205 bp, the number of unigenes was measured by the whole transcript. 1 729 SSR loci in the A. membranaceus transcriptome were searched, the occurrence frequency of SSR was 9.24%, the frequency of SSR in the whole transcriptome was 13.42%, the average length of SSR was 7.97 kb. One hundred and twenty-seven kinds of core repeat sequences were found, the dominant type was TG/AC type of dinucleotide, it appeared to account for 4.25% of the total SSR locus. The results of the sequence of the transcription of the A. membranaceus transcriptome revealed the overall expression, and a large number of unigenessequence was obtained, and the SSR locus in the genome of the A. membranaceus is high, and the type is diverse, and the polymorphism of the gene is high. Copyright© by the Chinese Pharmaceutical Association.

  11. The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)

    International Nuclear Information System (INIS)

    Nylund, Stian; Karlsen, Marius; Nylund, Are

    2008-01-01

    The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses, which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae

  12. Triptolide inhibits transcription of hTERT through down-regulation of transcription factor specificity protein 1 in primary effusion lymphoma cells

    Energy Technology Data Exchange (ETDEWEB)

    Long, Cong; Wang, Jingchao [Department of Pathogen Biology, School of Basic Medical Sciences, Wuhan University, Wuhan, 430071 (China); Guo, Wei [Department of Pathology and Physiology, School of Basic Medical Sciences, Wuhan University, Wuhan, 430071 (China); Wang, Huan; Wang, Chao; Liu, Yu [Department of Pathogen Biology, School of Basic Medical Sciences, Wuhan University, Wuhan, 430071 (China); Sun, Xiaoping, E-mail: xsun6@whu.edu.cn [Department of Pathogen Biology, School of Basic Medical Sciences, Wuhan University, Wuhan, 430071 (China); State Key Laboratory of Virology, Wuhan University, Wuhan, 430072 (China)

    2016-01-01

    Primary effusion lymphoma (PEL) is a rare and aggressive non-Hodgkin's lymphoma. Human telomerase reverse transcriptase (hTERT), a key component responsible for the regulation of telomerase activity, plays important roles in cellular immortalization and cancer development. Triptolide purified from Tripterygium extracts displays a broad-spectrum bioactivity profile, including immunosuppressive, anti-inflammatory, and anti-tumor. In this study, it is investigated whether triptolide reduces hTERT expression and suppresses its activity in PEL cells. The mRNA and protein levels of hTERT were examined by real time-PCR and Western blotting, respectively. The activity of hTERT promoter was determined by Dual luciferase reporter assay. Our results demonstrated that triptolide decreased expression of hTERT at both mRNA and protein levels. Further gene sequence analysis indicated that the activity of hTERT promoter was suppressed by triptolide. Triptolide also reduced the half-time of hTERT. Additionally, triptolide inhibited the expression of transcription factor specificity protein 1(Sp1) in PEL cells. Furthermore, knock-down of Sp1 by using specific shRNAs resulted in down-regulation of hTERT transcription and protein expression levels. Inhibition of Sp1 by specific shRNAs enhanced triptolide-induced cell growth inhibition and apoptosis. Collectively, our results demonstrate that the inhibitory effect of triptolide on hTERT transcription is possibly mediated by inhibition of transcription factor Sp1 in PEL cells. - Highlights: • Triptolide reduces expression of hTERT by decreasing its transcription level. • Triptolide reduces promoter activity and stability of hTERT. • Triptolide down-regulates expression of Sp1. • Special Sp1 shRNAs inhibit transcription and protein expression of hTERT. • Triptolide and Sp1 shRNA2 induce cell proliferation inhibition and apoptosis.

  13. Discovery of parvovirus-related sequences in an unexpected broad range of animals.

    Science.gov (United States)

    François, S; Filloux, D; Roumagnac, P; Bigot, D; Gayral, P; Martin, D P; Froissart, R; Ogliastro, M

    2016-09-07

    Our knowledge of the genetic diversity and host ranges of viruses is fragmentary. This is particularly true for the Parvoviridae family. Genetic diversity studies of single stranded DNA viruses within this family have been largely focused on arthropod- and vertebrate-infecting species that cause diseases of humans and our domesticated animals: a focus that has biased our perception of parvovirus diversity. While metagenomics approaches could help rectify this bias, so too could transcriptomics studies. Large amounts of transcriptomic data are available for a diverse array of animal species and whenever this data has inadvertently been gathered from virus-infected individuals, it could contain detectable viral transcripts. We therefore performed a systematic search for parvovirus-related sequences (PRSs) within publicly available transcript, genome and protein databases and eleven new transcriptome datasets. This revealed 463 PRSs in the transcript databases of 118 animals. At least 41 of these PRSs are likely integrated within animal genomes in that they were also found within genomic sequence databases. Besides illuminating the ubiquity of parvoviruses, the number of parvoviral sequences discovered within public databases revealed numerous previously unknown parvovirus-host combinations; particularly in invertebrates. Our findings suggest that the host-ranges of extant parvoviruses might span the entire animal kingdom.

  14. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  15. A community resource for high-throughput quantitative RT-PCR analysis of transcription factor gene expression in Medicago truncatula

    Directory of Open Access Journals (Sweden)

    Redman Julia C

    2008-07-01

    Full Text Available Abstract Background Medicago truncatula is a model legume species that is currently the focus of an international genome sequencing effort. Although several different oligonucleotide and cDNA arrays have been produced for genome-wide transcript analysis of this species, intrinsic limitations in the sensitivity of hybridization-based technologies mean that transcripts of genes expressed at low-levels cannot be measured accurately with these tools. Amongst such genes are many encoding transcription factors (TFs, which are arguably the most important class of regulatory proteins. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR is the most sensitive method currently available for transcript quantification, and one that can be scaled up to analyze transcripts of thousands of genes in parallel. Thus, qRT-PCR is an ideal method to tackle the problem of TF transcript quantification in Medicago and other plants. Results We established a bioinformatics pipeline to identify putative TF genes in Medicago truncatula and to design gene-specific oligonucleotide primers for qRT-PCR analysis of TF transcripts. We validated the efficacy and gene-specificity of over 1000 TF primer pairs and utilized these to identify sets of organ-enhanced TF genes that may play important roles in organ development or differentiation in this species. This community resource will be developed further as more genome sequence becomes available, with the ultimate goal of producing validated, gene-specific primers for all Medicago TF genes. Conclusion High-throughput qRT-PCR using a 384-well plate format enables rapid, flexible, and sensitive quantification of all predicted Medicago transcription factor mRNAs. This resource has been utilized recently by several groups in Europe, Australia, and the USA, and we expect that it will become the 'gold-standard' for TF transcript profiling in Medicago truncatula.

  16. Principal component analysis for predicting transcription-factor binding motifs from array-derived data

    Directory of Open Access Journals (Sweden)

    Vincenti Matthew P

    2005-11-01

    Full Text Available Abstract Background The responses to interleukin 1 (IL-1 in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs. In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD is a powerful method to derive primary components of a given matrix. Applying SVD to a promoter matrix defined from regulatory DNA sequences, we derived a novel method to predict the critical set of TFBMs. Results The promoter matrix was defined to establish a quantitative relationship between the IL-1-driven mRNA alteration and genomic DNA sequences of the IL-1 responsive genes. The matrix was decomposed with SVD, and the effects of 8 potential TFBMs (5'-CAGGC-3', 5'-CGCCC-3', 5'-CCGCC-3', 5'-ATGGG-3', 5'-GGGAA-3', 5'-CGTCC-3', 5'-AAAGG-3', and 5'-ACCCA-3' were predicted from a pool of 512 random DNA sequences. The prediction included matches to the core binding motifs of biologically known TFBMs such as AP2, SP1, EGR1, KROX, GC-BOX, ABI4, ETF, E2F, SRF, STAT, IK-1, PPARγ, STAF, ROAZ, and NFκB, and their significance was evaluated numerically using Monte Carlo simulation and genetic algorithm. Conclusion The described SVD-based prediction is an analytical method to provide a set of potential TFBMs involved in transcriptional regulation. The results would be useful to evaluate analytically a contribution of individual DNA sequences.

  17. RNA-Seq for gene identification and transcript profiling of three Stevia rebaudiana genotypes.

    Science.gov (United States)

    Chen, Junwen; Hou, Kai; Qin, Peng; Liu, Hongchang; Yi, Bin; Yang, Wenting; Wu, Wei

    2014-07-07

    Stevia (Stevia rebaudiana) is an important medicinal plant that yields diterpenoid steviol glycosides (SGs). SGs are currently used in the preparation of medicines, food products and neutraceuticals because of its sweetening property (zero calories and about 300 times sweeter than sugar). Recently, some progress has been made in understanding the biosynthesis of SGs in Stevia, but little is known about the molecular mechanisms underlying this process. Additionally, the genomics of Stevia, a non-model species, remains uncharacterized. The recent advent of RNA-Seq, a next generation sequencing technology, provides an opportunity to expand the identification of Stevia genes through in-depth transcript profiling. We present a comprehensive landscape of the transcriptome profiles of three genotypes of Stevia with divergent SG compositions characterized using RNA-seq. 191,590,282 high-quality reads were generated and then assembled into 171,837 transcripts with an average sequence length of 969 base pairs. A total of 80,160 unigenes were annotated, and 14,211 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Gene sequences of all enzymes known to be involved in SG synthesis were examined. A total of 143 UDP-glucosyltransferase (UGT) unigenes were identified, some of which might be involved in SG biosynthesis. The expression patterns of eight of these genes were further confirmed by RT-QPCR. RNA-seq analysis identified candidate genes encoding enzymes responsible for the biosynthesis of SGs in Stevia, a non-model plant without a reference genome. The transcriptome data from this study yielded new insights into the process of SG accumulation in Stevia. Our results demonstrate that RNA-Seq can be successfully used for gene identification and transcript profiling in a non-model species.

  18. Transcription-associated processes cause DNA double-strand breaks and translocations in neural stem/progenitor cells.

    Science.gov (United States)

    Schwer, Bjoern; Wei, Pei-Chi; Chang, Amelia N; Kao, Jennifer; Du, Zhou; Meyers, Robin M; Alt, Frederick W

    2016-02-23

    High-throughput, genome-wide translocation sequencing (HTGTS) studies of activated B cells have revealed that DNA double-strand breaks (DSBs) capable of translocating to defined bait DSBs are enriched around the transcription start sites (TSSs) of active genes. We used the HTGTS approach to investigate whether a similar phenomenon occurs in primary neural stem/progenitor cells (NSPCs). We report that breakpoint junctions indeed are enriched around TSSs that were determined to be active by global run-on sequencing analyses of NSPCs. Comparative analyses of transcription profiles in NSPCs and B cells revealed that the great majority of TSS-proximal junctions occurred in genes commonly expressed in both cell types, possibly because this common set has higher transcription levels on average than genes transcribed in only one or the other cell type. In the latter context, among all actively transcribed genes containing translocation junctions in NSPCs, those with junctions located within 2 kb of the TSS show a significantly higher transcription rate on average than genes with junctions in the gene body located at distances greater than 2 kb from the TSS. Finally, analysis of repair junction signatures of TSS-associated translocations in wild-type versus classical nonhomologous end-joining (C-NHEJ)-deficient NSPCs reveals that both C-NHEJ and alternative end-joining pathways can generate translocations by joining TSS-proximal DSBs to DSBs on other chromosomes. Our studies show that the generation of transcription-associated DSBs is conserved across divergent cell types.

  19. Transcriptome sequencing revealed significant alteration of cortical promoter usage and splicing in schizophrenia.

    Directory of Open Access Journals (Sweden)

    Jing Qin Wu

    Full Text Available While hybridization based analysis of the cortical transcriptome has provided important insight into the neuropathology of schizophrenia, it represents a restricted view of disease-associated gene activity based on predetermined probes. By contrast, sequencing technology can provide un-biased analysis of transcription at nucleotide resolution. Here we use this approach to investigate schizophrenia-associated cortical gene expression.The data was generated from 76 bp reads of RNA-Seq, aligned to the reference genome and assembled into transcripts for quantification of exons, splice variants and alternative promoters in postmortem superior temporal gyrus (STG/BA22 from 9 male subjects with schizophrenia and 9 matched non-psychiatric controls. Differentially expressed genes were then subjected to further sequence and functional group analysis. The output, amounting to more than 38 Gb of sequence, revealed significant alteration of gene expression including many previously shown to be associated with schizophrenia. Gene ontology enrichment analysis followed by functional map construction identified three functional clusters highly relevant to schizophrenia including neurotransmission related functions, synaptic vesicle trafficking, and neural development. Significantly, more than 2000 genes displayed schizophrenia-associated alternative promoter usage and more than 1000 genes showed differential splicing (FDR<0.05. Both types of transcriptional isoforms were exemplified by reads aligned to the neurodevelopmentally significant doublecortin-like kinase 1 (DCLK1 gene.This study provided the first deep and un-biased analysis of schizophrenia-associated transcriptional diversity within the STG, and revealed variants with important implications for the complex pathophysiology of schizophrenia.

  20. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. Human papilloma viruses and cervical tumours: mapping of integration sites and analysis of adjacent cellular sequences

    International Nuclear Information System (INIS)

    Klimov, Eugene; Vinokourova, Svetlana; Moisjak, Elena; Rakhmanaliev, Elian; Kobseva, Vera; Laimins, Laimonis; Kisseljov, Fjodor; Sulimova, Galina

    2002-01-01

    In cervical tumours the integration of human papilloma viruses (HPV) transcripts often results in the generation of transcripts that consist of hybrids of viral and cellular sequences. Mapping data using a variety of techniques has demonstrated that HPV integration occurred without obvious specificity into human genome. However, these techniques could not demonstrate whether integration resulted in the generation of transcripts encoding viral or viral-cellular sequences. The aim of this work was to map the integration sites of HPV DNA and to analyse the adjacent cellular sequences. Amplification of the INTs was done by the APOT technique. The APOT products were sequenced according to standard protocols. The analysis of the sequences was performed using BLASTN program and public databases. To localise the INTs PCR-based screening of GeneBridge4-RH-panel was used. Twelve cellular sequences adjacent to integrated HPV16 (INT markers) expressed in squamous cell cervical carcinomas were isolated. For 11 INT markers homologous human genomic sequences were readily identified and 9 of these showed significant homologies to known genes/ESTs. Using the known locations of homologous cDNAs and the RH-mapping techniques, mapping studies showed that the INTs are distributed among different human chromosomes for each tumour sample and are located in regions with the high levels of expression. Integration of HPV genomes occurs into the different human chromosomes but into regions that contain highly transcribed genes. One interpretation of these studies is that integration of HPV occurs into decondensed regions, which are more accessible for integration of foreign DNA

  2. Transcript Analysis and Regulative Events during Flower Development in Olive (Olea europaea L..

    Directory of Open Access Journals (Sweden)

    Fiammetta Alagna

    Full Text Available The identification and characterization of transcripts involved in flower organ development, plant reproduction and metabolism represent key steps in plant phenotypic and physiological pathways, and may generate high-quality transcript variants useful for the development of functional markers. This study was aimed at obtaining an extensive characterization of the olive flower transcripts, by providing sound information on the candidate MADS-box genes related to the ABC model of flower development and on the putative genetic and molecular determinants of ovary abortion and pollen-pistil interaction. The overall sequence data, obtained by pyrosequencing of four cDNA libraries from flowers at different developmental stages of three olive varieties with distinct reproductive features (Leccino, Frantoio and Dolce Agogia, included approximately 465,000 ESTs, which gave rise to more than 14,600 contigs and approximately 92,000 singletons. As many as 56,700 unigenes were successfully annotated and provided gene ontology insights into the structural organization and putative molecular function of sequenced transcripts and deduced proteins in the context of their corresponding biological processes. Differentially expressed genes with potential regulatory roles in biosynthetic pathways and metabolic networks during flower development were identified. The gene expression studies allowed us to select the candidate genes that play well-known molecular functions in a number of biosynthetic pathways and specific biological processes that affect olive reproduction. A sound understanding of gene functions and regulatory networks that characterize the olive flower is provided.

  3. Transcript Analysis and Regulative Events during Flower Development in Olive (Olea europaea L.).

    Science.gov (United States)

    Alagna, Fiammetta; Cirilli, Marco; Galla, Giulio; Carbone, Fabrizio; Daddiego, Loretta; Facella, Paolo; Lopez, Loredana; Colao, Chiara; Mariotti, Roberto; Cultrera, Nicolò; Rossi, Martina; Barcaccia, Gianni; Baldoni, Luciana; Muleo, Rosario; Perrotta, Gaetano

    2016-01-01

    The identification and characterization of transcripts involved in flower organ development, plant reproduction and metabolism represent key steps in plant phenotypic and physiological pathways, and may generate high-quality transcript variants useful for the development of functional markers. This study was aimed at obtaining an extensive characterization of the olive flower transcripts, by providing sound information on the candidate MADS-box genes related to the ABC model of flower development and on the putative genetic and molecular determinants of ovary abortion and pollen-pistil interaction. The overall sequence data, obtained by pyrosequencing of four cDNA libraries from flowers at different developmental stages of three olive varieties with distinct reproductive features (Leccino, Frantoio and Dolce Agogia), included approximately 465,000 ESTs, which gave rise to more than 14,600 contigs and approximately 92,000 singletons. As many as 56,700 unigenes were successfully annotated and provided gene ontology insights into the structural organization and putative molecular function of sequenced transcripts and deduced proteins in the context of their corresponding biological processes. Differentially expressed genes with potential regulatory roles in biosynthetic pathways and metabolic networks during flower development were identified. The gene expression studies allowed us to select the candidate genes that play well-known molecular functions in a number of biosynthetic pathways and specific biological processes that affect olive reproduction. A sound understanding of gene functions and regulatory networks that characterize the olive flower is provided.

  4. Co-transcriptomic Analysis by RNA Sequencing to Simultaneously Measure Regulated Gene Expression in Host and Bacterial Pathogen

    KAUST Repository

    Ravasi, Timothy; Mavromatis, Charalampos Harris; Bokil, Nilesh J.; Schembri, Mark A.; Sweet, Matthew J.

    2016-01-01

    Intramacrophage pathogens subvert antimicrobial defence pathways using various mechanisms, including the targeting of host TLR-mediated transcriptional responses. Conversely, TLR-inducible host defence mechanisms subject intramacrophage pathogens to stress, thus altering pathogen gene expression programs. Important biological insights can thus be gained through the analysis of gene expression changes in both the host and the pathogen during an infection. Traditionally, research methods have involved the use of qPCR, microarrays and/or RNA sequencing to identify transcriptional changes in either the host or the pathogen. Here we describe the application of RNA sequencing using samples obtained from in vitro infection assays to simultaneously quantify both host and bacterial pathogen gene expression changes, as well as general approaches that can be undertaken to interpret the RNA sequencing data that is generated. These methods can be used to provide insights into host TLR-regulated transcriptional responses to microbial challenge, as well as pathogen subversion mechanisms against such responses.

  5. Co-transcriptomic Analysis by RNA Sequencing to Simultaneously Measure Regulated Gene Expression in Host and Bacterial Pathogen

    KAUST Repository

    Ravasi, Timothy

    2016-01-24

    Intramacrophage pathogens subvert antimicrobial defence pathways using various mechanisms, including the targeting of host TLR-mediated transcriptional responses. Conversely, TLR-inducible host defence mechanisms subject intramacrophage pathogens to stress, thus altering pathogen gene expression programs. Important biological insights can thus be gained through the analysis of gene expression changes in both the host and the pathogen during an infection. Traditionally, research methods have involved the use of qPCR, microarrays and/or RNA sequencing to identify transcriptional changes in either the host or the pathogen. Here we describe the application of RNA sequencing using samples obtained from in vitro infection assays to simultaneously quantify both host and bacterial pathogen gene expression changes, as well as general approaches that can be undertaken to interpret the RNA sequencing data that is generated. These methods can be used to provide insights into host TLR-regulated transcriptional responses to microbial challenge, as well as pathogen subversion mechanisms against such responses.

  6. Electrostatic study of Alanine mutational effects on transcription: application to GATA-3:DNA interaction complex.

    Science.gov (United States)

    El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges

    2015-01-01

    Protein-DNA interaction is of fundamental importance in molecular biology, playing roles in functions as diverse as DNA transcription, DNA structure formation, and DNA repair. Protein-DNA association is also important in medicine; understanding Protein-DNA binding kinetics can assist in identifying disease root causes which can contribute to drug development. In this perspective, this work focuses on the transcription process by the GATA Transcription Factor (TF). GATA TF binds to DNA promoter region represented by `G,A,T,A' nucleotides sequence, and initiates transcription of target genes. When proper regulation fails due to some mutations on the GATA TF protein sequence or on the DNA promoter sequence (weak promoter), deregulation of the target genes might lead to various disorders. In this study, we aim to understand the electrostatic mechanism behind GATA TF and DNA promoter interactions, in order to predict Protein-DNA binding in the presence of mutations, while elaborating on non-covalent binding kinetics. To generate a family of mutants for the GATA:DNA complex, we replaced every charged amino acid, one at a time, with a neutral amino acid like Alanine (Ala). We then applied Poisson-Boltzmann electrostatic calculations feeding into free energy calculations, for each mutation. These calculations delineate the contribution to binding from each Ala-replaced amino acid in the GATA:DNA interaction. After analyzing the obtained data in view of a two-step model, we are able to identify potential key amino acids in binding. Finally, we applied the model to GATA-3:DNA (crystal structure with PDB-ID: 3DFV) binding complex and validated it against experimental results from the literature.

  7. Human GW182 Paralogs Are the Central Organizers for RNA-Mediated Control of Transcription.

    Science.gov (United States)

    Hicks, Jessica A; Li, Liande; Matsui, Masayuki; Chu, Yongjun; Volkov, Oleg; Johnson, Krystal C; Corey, David R

    2017-08-15

    In the cytoplasm, small RNAs can control mammalian translation by regulating the stability of mRNA. In the nucleus, small RNAs can also control transcription and splicing. The mechanisms for RNA-mediated nuclear regulation are not understood and remain controversial, hindering the effective application of nuclear RNAi and investigation of its natural regulatory roles. Here, we reveal that the human GW182 paralogs TNRC6A/B/C are central organizing factors critical to RNA-mediated transcriptional activation. Mass spectrometry of purified nuclear lysates followed by experimental validation demonstrates that TNRC6A interacts with proteins involved in protein degradation, RNAi, the CCR4-NOT complex, the mediator complex, and histone-modifying complexes. Functional analysis implicates TNRC6A, NAT10, MED14, and WDR5 in RNA-mediated transcriptional activation. These findings describe protein complexes capable of bridging RNA-mediated sequence-specific recognition of noncoding RNA transcripts with the regulation of gene transcription. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Nuclear Matrix protein SMAR1 represses HIV-1 LTR mediated transcription through chromatin remodeling

    International Nuclear Information System (INIS)

    Sreenath, Kadreppa; Pavithra, Lakshminarasimhan; Singh, Sandeep; Sinha, Surajit; Dash, Prasanta K.; Siddappa, Nagadenahalli B.; Ranga, Udaykumar; Mitra, Debashis; Chattopadhyay, Samit

    2010-01-01

    Nuclear Matrix and MARs have been implicated in the transcriptional regulation of host as well as viral genes but their precise role in HIV-1 transcription remains unclear. Here, we show that > 98% of HIV sequences contain consensus MAR element in their promoter. We show that SMAR1 binds to the LTR MAR and reinforces transcriptional silencing by tethering the LTR MAR to nuclear matrix. SMAR1 associated HDAC1-mSin3 corepressor complex is dislodged from the LTR upon cellular activation by PMA/TNFα leading to an increase in the acetylation and a reduction in the trimethylation of histones, associated with the recruitment of RNA Polymerase II on the LTR. Overexpression of SMAR1 lead to reduction in LTR mediated transcription, both in a Tat dependent and independent manner, resulting in a decreased virion production. These results demonstrate the role of SMAR1 in regulating viral transcription by alternative compartmentalization of LTR between the nuclear matrix and chromatin.

  9. Single-Cell RNA-Seq Reveals Transcriptional Heterogeneity in Latent and Reactivated HIV-Infected Cells.

    Science.gov (United States)

    Golumbeanu, Monica; Cristinelli, Sara; Rato, Sylvie; Munoz, Miguel; Cavassini, Matthias; Beerenwinkel, Niko; Ciuffi, Angela

    2018-04-24

    Despite effective treatment, HIV can persist in latent reservoirs, which represent a major obstacle toward HIV eradication. Targeting and reactivating latent cells is challenging due to the heterogeneous nature of HIV-infected cells. Here, we used a primary model of HIV latency and single-cell RNA sequencing to characterize transcriptional heterogeneity during HIV latency and reactivation. Our analysis identified transcriptional programs leading to successful reactivation of HIV expression. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  10. A Transcription Activator-Like Effector (TALE) Toolbox for Genome Engineering

    Science.gov (United States)

    Sanjana, Neville E.; Cong, Le; Zhou, Yang; Cunniff, Margaret M.; Feng, Guoping; Zhang, Feng

    2013-01-01

    Transcription activator-like effectors (TALEs) are a class of naturally occurring DNA binding proteins found in the plant pathogen Xanthomonas sp. The DNA binding domain of each TALE consists of tandem 34-amino acid repeat modules that can be rearranged according to a simple cipher to target new DNA sequences. Customized TALEs can be used for a wide variety of genome engineering applications, including transcriptional modulation and genome editing. Here we describe a toolbox for rapid construction of custom TALE transcription factors (TALE-TFs) and nucleases (TALENs) using a hierarchical ligation procedure. This toolbox facilitates affordable and rapid construction of custom TALE-TFs and TALENs within one week and can be easily scaled up to construct TALEs for multiple targets in parallel. We also provide details for testing the activity in mammalian cells of custom TALE-TFs and TALENs using, respectively, qRT-PCR and Surveyor nuclease. The TALE toolbox described here will enable a broad range of biological applications. PMID:22222791

  11. Experimental Incubations Elicit Profound Changes in Community Transcription in OMZ Bacterioplankton

    Science.gov (United States)

    Stewart, Frank J.; Dalsgaard, Tage; Young, Curtis R.; Thamdrup, Bo; Revsbech, Niels Peter; Ulloa, Osvaldo; Canfield, Don E.; DeLong, Edward F.

    2012-01-01

    Sequencing of microbial community RNA (metatranscriptome) is a useful approach for assessing gene expression in microorganisms from the natural environment. This method has revealed transcriptional patterns in situ, but can also be used to detect transcriptional cascades in microcosms following experimental perturbation. Unambiguously identifying differential transcription between control and experimental treatments requires constraining effects that are simply due to sampling and bottle enclosure. These effects remain largely uncharacterized for “challenging” microbial samples, such as those from anoxic regions that require special handling to maintain in situ conditions. Here, we demonstrate substantial changes in microbial transcription induced by sample collection and incubation in experimental bioreactors. Microbial communities were sampled from the water column of a marine oxygen minimum zone by a pump system that introduced minimal oxygen contamination and subsequently incubated in bioreactors under near in situ oxygen and temperature conditions. Relative to the source water, experimental samples became dominated by transcripts suggestive of cell stress, including chaperone, protease, and RNA degradation genes from diverse taxa, with strong representation from SAR11-like alphaproteobacteria. In tandem, transcripts matching facultative anaerobic gammaproteobacteria of the Alteromonadales (e.g., Colwellia) increased 4–13 fold up to 43% of coding transcripts, and encoded a diverse gene set suggestive of protein synthesis and cell growth. We interpret these patterns as taxon-specific responses to combined environmental changes in the bioreactors, including shifts in substrate or oxygen availability, and minor temperature and pressure changes during sampling with the pump system. Whether such changes confound analysis of transcriptional patterns may vary based on the design of the experiment, the taxonomic composition of the source community, and on the

  12. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney

    2012-10-08

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the \\'oligo-capping\\' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5\\'-ESTs and 41,317 3\\'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for

  13. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney; Kodzius, Rimantas; Tan, Yue Ying; Tay, Alice; Tay, Boon-Hui; Venkatesh, Byrappa

    2012-01-01

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole

  14. The specificity and flexibility of l1 reverse transcription priming at imperfect T-tracts.

    Directory of Open Access Journals (Sweden)

    Clément Monot

    2013-05-01

    Full Text Available L1 retrotransposons have a prominent role in reshaping mammalian genomes. To replicate, the L1 ribonucleoprotein particle (RNP first uses its endonuclease (EN to nick the genomic DNA. The newly generated DNA end is subsequently used as a primer to initiate reverse transcription within the L1 RNA poly(A tail, a process known as target-primed reverse transcription (TPRT. Prior studies demonstrated that most L1 insertions occur into sequences related to the L1 EN consensus sequence (degenerate 5'-TTTT/A-3' sites and frequently preceded by imperfect T-tracts. However, it is currently unclear whether--and to which degree--the liberated 3'-hydroxyl extremity on the genomic DNA needs to be accessible and complementary to the poly(A tail of the L1 RNA for efficient priming of reverse transcription. Here, we employed a direct assay for the initiation of L1 reverse transcription to define the molecular rules that guide this process. First, efficient priming is detected with as few as 4 matching nucleotides at the primer 3' end. Second, L1 RNP can tolerate terminal mismatches if they are compensated within the 10 last bases of the primer by an increased number of matching nucleotides. All terminal mismatches are not equally detrimental to DNA extension, a C being extended at higher levels than an A or a G. Third, efficient priming in the context of duplex DNA requires a 3' overhang. This suggests the possible existence of additional DNA processing steps, which generate a single-stranded 3' end to allow L1 reverse transcription. Based on these data we propose that the specificity of L1 reverse transcription initiation contributes, together with the specificity of the initial EN cleavage, to the distribution of new L1 insertions within the human genome.

  15. The specificity and flexibility of l1 reverse transcription priming at imperfect T-tracts.

    Science.gov (United States)

    Monot, Clément; Kuciak, Monika; Viollet, Sébastien; Mir, Ashfaq Ali; Gabus, Caroline; Darlix, Jean-Luc; Cristofari, Gaël

    2013-05-01

    L1 retrotransposons have a prominent role in reshaping mammalian genomes. To replicate, the L1 ribonucleoprotein particle (RNP) first uses its endonuclease (EN) to nick the genomic DNA. The newly generated DNA end is subsequently used as a primer to initiate reverse transcription within the L1 RNA poly(A) tail, a process known as target-primed reverse transcription (TPRT). Prior studies demonstrated that most L1 insertions occur into sequences related to the L1 EN consensus sequence (degenerate 5'-TTTT/A-3' sites) and frequently preceded by imperfect T-tracts. However, it is currently unclear whether--and to which degree--the liberated 3'-hydroxyl extremity on the genomic DNA needs to be accessible and complementary to the poly(A) tail of the L1 RNA for efficient priming of reverse transcription. Here, we employed a direct assay for the initiation of L1 reverse transcription to define the molecular rules that guide this process. First, efficient priming is detected with as few as 4 matching nucleotides at the primer 3' end. Second, L1 RNP can tolerate terminal mismatches if they are compensated within the 10 last bases of the primer by an increased number of matching nucleotides. All terminal mismatches are not equally detrimental to DNA extension, a C being extended at higher levels than an A or a G. Third, efficient priming in the context of duplex DNA requires a 3' overhang. This suggests the possible existence of additional DNA processing steps, which generate a single-stranded 3' end to allow L1 reverse transcription. Based on these data we propose that the specificity of L1 reverse transcription initiation contributes, together with the specificity of the initial EN cleavage, to the distribution of new L1 insertions within the human genome.

  16. Identification of transcripts related to high egg production in the chicken hypothalamus and pituitary gland.

    Science.gov (United States)

    Shiue, Yow-Ling; Chen, Lih-Ren; Chen, Chih-Feng; Chen, Yi-Ling; Ju, Jhy-Phen; Chao, Ching-Hsien; Lin, Yuan-Ping; Kuo, Yu-Ming; Tang, Pin-Chi; Lee, Yen-Pai

    2006-09-15

    To identify transcripts related to high egg production expressed specifically in the hypothalamus and pituitary gland of the chicken, two subtracted cDNA libraries were constructed. Two divergently selected strains of Taiwan Country Chickens (TCCs), B (sire line) and L2 (dam line) were used; they had originated from a single population and were further subjected (since 1982) to selection for egg production to 40 wk of age and body weight/comb size, respectively. A total of 324 and 370 clones were identified from the L2-B (L2-subtract-B) and the B-L2 subtracted cDNA libraries, respectively. After sequencing and annotation, 175 and 136 transcripts that represented 53 known and 65 unknown non-redundant sequences were characterized in the L2-B subtracted cDNA library. Quantitative reverse-transcription (RT)-PCR was used to screen the mRNA expression levels of 32 randomly selected transcripts in another 78 laying hens from five different strains. These strains included the two original strains (B and L2) used to construct the subtracted cDNA libraries and an additional three commercial strains, i.e., Black- and Red-feather TCCs and Single-Comb White Leghorn (WL) layer. The mRNA expression levels of 16 transcripts were significantly higher in the L2 than in the B strain, whereas the mRNA expression levels of nine transcripts, BDH, NCAM1, PCDHA@, PGDS, PLAG1, PRL, SAR1A, SCG2 and STMN2, were significantly higher in two high egg production strains, L2 and Single-Comb WL; this indicated their usefulness as molecular markers of high egg production.

  17. Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq.

    Science.gov (United States)

    Liu, Ruolin; Dickerson, Julie

    2017-11-01

    We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.

  18. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.

    Science.gov (United States)

    Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T

    2017-10-01

    Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

  19. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Cen Wan

    2017-10-01

    Full Text Available Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

  20. High-resolution detection of DNA binding sites of the global transcriptional regulator GlxR in Corynebacterium glutamicum

    DEFF Research Database (Denmark)

    Jungwirth, Britta; Sala, Claudia; Kohl, Thomas A

    2013-01-01

    of the 6C non-coding RNA gene and to non-canonical DNA binding sites within protein-coding regions. The present study underlines the dynamics within the GlxR regulon by identifying in vivo targets during growth on glucose and contributes to the expansion of knowledge of this important transcriptional......The transcriptional regulator GlxR has been characterized as a global hub within the gene-regulatory network of Corynebacterium glutamicum. Chromatin immunoprecipitation with a specific anti-GlxR antibody and subsequent high-throughput sequencing (ChIP-seq) was applied to C. glutamicum to get new...... mapping of these data on the genome sequence of C. glutamicum, 107 enriched DNA fragments were detected from cells grown with glucose as carbon source. GlxR binding sites were identified in the sequence of 79 enriched DNA fragments, of which 21 sites were not previously reported. Electrophoretic mobility...

  1. Supervised Sequence Labelling with Recurrent Neural Networks

    CERN Document Server

    Graves, Alex

    2012-01-01

    Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools—robust to input noise and distortion, able to exploit long-range contextual information—that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary.    The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional...

  2. MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

    Science.gov (United States)

    Behr, Jonas; Kahles, André; Zhong, Yi; Sreedharan, Vipin T; Drewe, Philipp; Rätsch, Gunnar

    2013-10-15

    High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.

  3. Hybrid incompatibility arises in a sequence-based bioenergetic model of transcription factor binding.

    Science.gov (United States)

    Tulchinsky, Alexander Y; Johnson, Norman A; Watt, Ward B; Porter, Adam H

    2014-11-01

    Postzygotic isolation between incipient species results from the accumulation of incompatibilities that arise as a consequence of genetic divergence. When phenotypes are determined by regulatory interactions, hybrid incompatibility can evolve even as a consequence of parallel adaptation in parental populations because interacting genes can produce the same phenotype through incompatible allelic combinations. We explore the evolutionary conditions that promote and constrain hybrid incompatibility in regulatory networks using a bioenergetic model (combining thermodynamics and kinetics) of transcriptional regulation, considering the bioenergetic basis of molecular interactions between transcription factors (TFs) and their binding sites. The bioenergetic parameters consider the free energy of formation of the bond between the TF and its binding site and the availability of TFs in the intracellular environment. Together these determine fractional occupancy of the TF on the promoter site, the degree of subsequent gene expression and in diploids, and the degree of dominance among allelic interactions. This results in a sigmoid genotype-phenotype map and fitness landscape, with the details of the shape determining the degree of bioenergetic evolutionary constraint on hybrid incompatibility. Using individual-based simulations, we subjected two allopatric populations to parallel directional or stabilizing selection. Misregulation of hybrid gene expression occurred under either type of selection, although it evolved faster under directional selection. Under directional selection, the extent of hybrid incompatibility increased with the slope of the genotype-phenotype map near the derived parental expression level. Under stabilizing selection, hybrid incompatibility arose from compensatory mutations and was greater when the bioenergetic properties of the interaction caused the space of nearly neutral genotypes around the stable expression level to be wide. F2's showed higher

  4. Regulation of zebrafish CYP3A65 transcription by AHR2

    International Nuclear Information System (INIS)

    Chang, Chin-Teng; Chung, Hsin-Yu; Su, Hsiao-Ting; Tseng, Hua-Pin; Tzou, Wen-Shyong; Hu, Chin-Hwa

    2013-01-01

    CYP3A proteins are the most abundant CYPs in the liver and intestines, and they play a pivotal role in drug metabolism. In mammals, CYP3A genes are induced by various xenobiotics through processes mediated by PXR. We previously identified zebrafish CYP3A65 as a CYP3A ortholog that is constitutively expressed in gastrointestinal tissues, and is upregulated by treatment with dexamethasone, rifampicin or tetrachlorodibenzo-p-dioxin (TCDD). However, the underlying mechanism of TCDD-mediated CYP3A65 transcription is unclear. Here we generated two transgenic zebrafish, Tg(CYP3A65S:EGFP) and Tg(CYP3A65L:EGFP), which contain 2.1 and 5.4 kb 5′ flanking sequences, respectively, of the CYP3A65 gene upstream of EGFP. Both transgenic lines express EGFP in larval gastrointestinal tissues in a pattern similar to that of the endogenous CYP3A65 gene. Moreover, EGFP expression can be significantly induced by TCDD exposure during the larval stage. In addition, EGFP expression can be stimulated by kynurenine, a putative AHR ligand produced during tryptophan metabolism. AHRE elements in the upstream regulatory region of the CYP3A65 gene are indispensible for basal and TCDD-induced transcription. Furthermore, the AHR2 DNA and ligand-binding domains are required to mediate effective CYP3A65 transcription. AHRE sequences are present in the promoters of many teleost CYP3 genes, but not of mammalian CYP3 genes, suggesting that AHR/AHR2-mediated transcription is likely a common regulatory mechanism for teleost CYP3 genes. It may also reflect the different environments that terrestrial and aquatic organisms encounter. - Highlights: • Tg(CYP3A65:EGFP) and CYP3A65 exhibits identical expression pattern. • CYP3A65 can be significantly induced by TCDD or kynurenine. • The AHRE elements are required to mediate CYP3A65 transcription. • The AHR2 DNA and ligand-binding domains are required for CYP3A65 transcription. • AHRE elements are present in many teleost CYP3 genes, but not in

  5. Regulation of zebrafish CYP3A65 transcription by AHR2

    Energy Technology Data Exchange (ETDEWEB)

    Chang, Chin-Teng; Chung, Hsin-Yu; Su, Hsiao-Ting; Tseng, Hua-Pin [Institute of Bioscience and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan (China); Tzou, Wen-Shyong [Institute of Bioscience and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan (China); Center of Excellence for Marine Bioenvironment and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan (China); Hu, Chin-Hwa, E-mail: chhu@mail.ntou.edu.tw [Institute of Bioscience and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan (China); Center of Excellence for Marine Bioenvironment and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan (China)

    2013-07-15

    CYP3A proteins are the most abundant CYPs in the liver and intestines, and they play a pivotal role in drug metabolism. In mammals, CYP3A genes are induced by various xenobiotics through processes mediated by PXR. We previously identified zebrafish CYP3A65 as a CYP3A ortholog that is constitutively expressed in gastrointestinal tissues, and is upregulated by treatment with dexamethasone, rifampicin or tetrachlorodibenzo-p-dioxin (TCDD). However, the underlying mechanism of TCDD-mediated CYP3A65 transcription is unclear. Here we generated two transgenic zebrafish, Tg(CYP3A65S:EGFP) and Tg(CYP3A65L:EGFP), which contain 2.1 and 5.4 kb 5′ flanking sequences, respectively, of the CYP3A65 gene upstream of EGFP. Both transgenic lines express EGFP in larval gastrointestinal tissues in a pattern similar to that of the endogenous CYP3A65 gene. Moreover, EGFP expression can be significantly induced by TCDD exposure during the larval stage. In addition, EGFP expression can be stimulated by kynurenine, a putative AHR ligand produced during tryptophan metabolism. AHRE elements in the upstream regulatory region of the CYP3A65 gene are indispensible for basal and TCDD-induced transcription. Furthermore, the AHR2 DNA and ligand-binding domains are required to mediate effective CYP3A65 transcription. AHRE sequences are present in the promoters of many teleost CYP3 genes, but not of mammalian CYP3 genes, suggesting that AHR/AHR2-mediated transcription is likely a common regulatory mechanism for teleost CYP3 genes. It may also reflect the different environments that terrestrial and aquatic organisms encounter. - Highlights: • Tg(CYP3A65:EGFP) and CYP3A65 exhibits identical expression pattern. • CYP3A65 can be significantly induced by TCDD or kynurenine. • The AHRE elements are required to mediate CYP3A65 transcription. • The AHR2 DNA and ligand-binding domains are required for CYP3A65 transcription. • AHRE elements are present in many teleost CYP3 genes, but not in

  6. Applications of nanotechnology, next generation sequencing and microarrays in biomedical research.

    Science.gov (United States)

    Elingaramil, Sauli; Li, Xiaolong; He, Nongyue

    2013-07-01

    Next-generation sequencing technologies, microarrays and advances in bio nanotechnology have had an enormous impact on research within a short time frame. This impact appears certain to increase further as many biomedical institutions are now acquiring these prevailing new technologies. Beyond conventional sampling of genome content, wide-ranging applications are rapidly evolving for next-generation sequencing, microarrays and nanotechnology. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted re sequencing and discovery of transcription factor binding sites, noncoding RNA expression profiling and molecular diagnostics. This paper thus discusses current applications of nanotechnology, next-generation sequencing technologies and microarrays in biomedical research and highlights the transforming potential these technologies offer.

  7. Large-scale Identification of Expressed Sequence Tags (ESTs from Nicotianatabacum by Normalized cDNA Library Sequencing

    Directory of Open Access Journals (Sweden)

    Alvarez S Perez

    2014-12-01

    Full Text Available An expressed sequence tags (EST resource for tobacco plants (Nicotianatabacum was established using high-throughput sequencing of randomly selected clones from one cDNA library representing a range of plant organs (leaf, stem, root and root base. Over 5000 ESTs were generated from the 3’ ends of 8000 clones, analyzed by BLAST searches and categorized functionally. All annotated ESTs were classified into 18 functional categories, unique transcripts involved in energy were the largest group accounting for 831 (32.32% of the annotated ESTs. After excluding 2450 non-significant tentative unique transcripts (TUTs, 100 unique sequences (1.67% of total TUTs were identified from the N. tabacum database. In the array result two genes strongly related to the tobacco mosaic virus (TMV were obtained, one basic form of pathogenesis-related protein 1 precursor (TBT012G08 and ubiquitin (TBT087G01. Both of them were found in the variety Hongda, some other important genes were classified into two groups, one of these implicated in plant development like those genes related to a photosynthetic process (chlorophyll a-b binding protein, photosystem I, ferredoxin I and III, ATP synthase and a further group including genes related to plant stress response (ubiquitin, ubiquitin-like protein SMT3, glycine-rich RNA binding protein, histones and methallothionein. The interesting finding in this study is that two of these genes have never been reported before in N. tabacum (ubiquitin-like protein SMT3 and methallothionein. The array results were confirmed using quantitative PCR.

  8. Structural basis for sequence-specific recognition of DNA by TAL effectors

    KAUST Repository

    Deng, Dong; Yan, Chuangye; Pan, Xiaojing; Mahfouz, Magdy M.; Wang, Jiawei; Zhu, Jiankang; Shi, Yi Gong; Yan, Nieng

    2012-01-01

    TAL (transcription activator-like) effectors, secreted by phytopathogenic bacteria, recognize host DNA sequences through a central domain of tandem repeats. Each repeat comprises 33 to 35 conserved amino acids and targets a specific base pair

  9. G =  MAT: linking transcription factor expression and DNA binding data.

    Science.gov (United States)

    Tretyakov, Konstantin; Laur, Sven; Vilo, Jaak

    2011-01-31

    Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/.

  10. G = MAT: Linking Transcription Factor Expression and DNA Binding Data

    Science.gov (United States)

    Tretyakov, Konstantin; Laur, Sven; Vilo, Jaak

    2011-01-01

    Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/. PMID:21297945

  11. Sequence mining and transcript profiling to explore cyst nematode parasitism

    Directory of Open Access Journals (Sweden)

    Recknor Justin

    2009-01-01

    Full Text Available Abstract Background Cyst nematodes are devastating plant parasites that become sedentary within plant roots and induce the transformation of normal plant cells into elaborate feeding cells with the help of secreted effectors, the parasitism proteins. These proteins are the translation products of parasitism genes and are secreted molecular tools that allow cyst nematodes to infect plants. Results We present here the expression patterns of all previously described parasitism genes of the soybean cyst nematode, Heterodera glycines, in all major life stages except the adult male. These insights were gained by analyzing our gene expression dataset from experiments using the Affymetrix Soybean Genome Array GeneChip, which contains probeset sequences for 6,860 genes derived from preparasitic and parasitic H. glycines life stages. Targeting the identification of additional H. glycines parasitism-associated genes, we isolated 633 genes encoding secretory proteins using algorithms to predict secretory signal peptides. Furthermore, because some of the known H. glycines parasitism proteins have strongest similarity to proteins of plants and microbes, we searched for predicted protein sequences that showed their highest similarities to plant or microbial proteins and identified 156 H. glycines genes, some of which also contained a signal peptide. Analyses of the expression profiles of these genes allowed the formulation of hypotheses about potential roles in parasitism. This is the first study combining sequence analyses of a substantial EST dataset with microarray expression data of all major life stages (except adult males for the identification and characterization of putative parasitism-associated proteins in any parasitic nematode. Conclusion We have established an expression atlas for all known H. glycines parasitism genes. Furthermore, in an effort to identify additional H. glycines genes with putative functions in parasitism, we have reduced the

  12. Sequence conservation and combinatorial complexity of Drosophila neural precursor cell enhancers

    Directory of Open Access Journals (Sweden)

    Kuzin Alexander

    2008-08-01

    Full Text Available Abstract Background The presence of highly conserved sequences within cis-regulatory regions can serve as a valuable starting point for elucidating the basis of enhancer function. This study focuses on regulation of gene expression during the early events of Drosophila neural development. We describe the use of EvoPrinter and cis-Decoder, a suite of interrelated phylogenetic footprinting and alignment programs, to characterize highly conserved sequences that are shared among co-regulating enhancers. Results Analysis of in vivo characterized enhancers that drive neural precursor gene expression has revealed that they contain clusters of highly conserved sequence blocks (CSBs made up of shorter shared sequence elements which are present in different combinations and orientations within the different co-regulating enhancers; these elements contain either known consensus transcription factor binding sites or consist of novel sequences that have not been functionally characterized. The CSBs of co-regulated enhancers share a large number of sequence elements, suggesting that a diverse repertoire of transcription factors may interact in a highly combinatorial fashion to coordinately regulate gene expression. We have used information gained from our comparative analysis to discover an enhancer that directs expression of the nervy gene in neural precursor cells of the CNS and PNS. Conclusion The combined use EvoPrinter and cis-Decoder has yielded important insights into the combinatorial appearance of fundamental sequence elements required for neural enhancer function. Each of the 30 enhancers examined conformed to a pattern of highly conserved blocks of sequences containing shared constituent elements. These data establish a basis for further analysis and understanding of neural enhancer function.

  13. Transcriptome Sequencing Revealed Significant Alteration of Cortical Promoter Usage and Splicing in Schizophrenia

    Science.gov (United States)

    Wu, Jing Qin; Wang, Xi; Beveridge, Natalie J.; Tooney, Paul A.; Scott, Rodney J.; Carr, Vaughan J.; Cairns, Murray J.

    2012-01-01

    Background While hybridization based analysis of the cortical transcriptome has provided important insight into the neuropathology of schizophrenia, it represents a restricted view of disease-associated gene activity based on predetermined probes. By contrast, sequencing technology can provide un-biased analysis of transcription at nucleotide resolution. Here we use this approach to investigate schizophrenia-associated cortical gene expression. Methodology/Principal Findings The data was generated from 76 bp reads of RNA-Seq, aligned to the reference genome and assembled into transcripts for quantification of exons, splice variants and alternative promoters in postmortem superior temporal gyrus (STG/BA22) from 9 male subjects with schizophrenia and 9 matched non-psychiatric controls. Differentially expressed genes were then subjected to further sequence and functional group analysis. The output, amounting to more than 38 Gb of sequence, revealed significant alteration of gene expression including many previously shown to be associated with schizophrenia. Gene ontology enrichment analysis followed by functional map construction identified three functional clusters highly relevant to schizophrenia including neurotransmission related functions, synaptic vesicle trafficking, and neural development. Significantly, more than 2000 genes displayed schizophrenia-associated alternative promoter usage and more than 1000 genes showed differential splicing (FDRschizophrenia-associated transcriptional diversity within the STG, and revealed variants with important implications for the complex pathophysiology of schizophrenia. PMID:22558445

  14. Transcription Profiling Demonstrates Epigenetic Control of Non-retroviral RNA Virus-Derived Elements in the Human Genome

    Directory of Open Access Journals (Sweden)

    Kozue Sofuku

    2015-09-01

    Full Text Available Endogenous bornavirus-like nucleoprotein elements (EBLNs are DNA sequences in vertebrate genomes formed by the retrotransposon-mediated integration of ancient bornavirus sequence. Thus, EBLNs evidence a mechanism of retrotransposon-mediated RNA-to-DNA information flow from environment to animals. Although EBLNs are non-transposable, they share some features with retrotransposons. Here, to test whether hosts control the expression of EBLNs similarly to retrotransposons, we profiled the transcription of all Homo sapiens EBLNs (hsEBLN-1 to hsEBLN-7. We could detect transcription of all hsEBLNs in at least one tissue. Among them, hsEBLN-1 is transcribed almost exclusively in the testis. In most tissues, expression from the hsEBLN-1 locus is silenced epigenetically. Finally, we showed the possibility that hsEBLN-1 integration at this locus affects the expression of a neighboring gene. Our results suggest that hosts regulate the expression of endogenous non-retroviral virus elements similarly to how they regulate the expression of retrotransposons, possibly contributing to new transcripts and regulatory complexity to the human genome.

  15. Integrated mRNA and microRNA transcriptome sequencing characterizes sequence variants and mRNA–microRNA regulatory network in nasopharyngeal carcinoma model systems

    Directory of Open Access Journals (Sweden)

    Carol Ying-Ying Szeto

    2014-01-01

    Full Text Available Nasopharyngeal carcinoma (NPC is a prevalent malignancy in Southeast Asia among the Chinese population. Aberrant regulation of transcripts has been implicated in many types of cancers including NPC. Herein, we characterized mRNA and miRNA transcriptomes by RNA sequencing (RNASeq of NPC model systems. Matched total mRNA and small RNA of undifferentiated Epstein–Barr virus (EBV-positive NPC xenograft X666 and its derived cell line C666, well-differentiated NPC cell line HK1, and the immortalized nasopharyngeal epithelial cell line NP460 were sequenced by Solexa technology. We found 2812 genes and 149 miRNAs (human and EBV to be differentially expressed in NP460, HK1, C666 and X666 with RNASeq; 533 miRNA–mRNA target pairs were inversely regulated in the three NPC cell lines compared to NP460. Integrated mRNA/miRNA expression profiling and pathway analysis show extracellular matrix organization, Beta-1 integrin cell surface interactions, and the PI3K/AKT, EGFR, ErbB, and Wnt pathways were potentially deregulated in NPC. Real-time quantitative PCR was performed on selected mRNA/miRNAs in order to validate their expression. Transcript sequence variants such as short insertions and deletions (INDEL, single nucleotide variant (SNV, and isomiRs were characterized in the NPC model systems. A novel TP53 transcript variant was identified in NP460, HK1, and C666. Detection of three previously reported novel EBV-encoded BART miRNAs and their isomiRs were also observed. Meta-analysis of a model system to a clinical system aids the choice of different cell lines in NPC studies. This comprehensive characterization of mRNA and miRNA transcriptomes in NPC cell lines and the xenograft provides insights on miRNA regulation of mRNA and valuable resources on transcript variation and regulation in NPC, which are potentially useful for mechanistic and preclinical studies.

  16. Circuit-wide Transcriptional Profiling Reveals Brain Region-Specific Gene Networks Regulating Depression Susceptibility.

    Science.gov (United States)

    Bagot, Rosemary C; Cates, Hannah M; Purushothaman, Immanuel; Lorsch, Zachary S; Walker, Deena M; Wang, Junshi; Huang, Xiaojie; Schlüter, Oliver M; Maze, Ian; Peña, Catherine J; Heller, Elizabeth A; Issler, Orna; Wang, Minghui; Song, Won-Min; Stein, Jason L; Liu, Xiaochuan; Doyle, Marie A; Scobie, Kimberly N; Sun, Hao Sheng; Neve, Rachael L; Geschwind, Daniel; Dong, Yan; Shen, Li; Zhang, Bin; Nestler, Eric J

    2016-06-01

    Depression is a complex, heterogeneous disorder and a leading contributor to the global burden of disease. Most previous research has focused on individual brain regions and genes contributing to depression. However, emerging evidence in humans and animal models suggests that dysregulated circuit function and gene expression across multiple brain regions drive depressive phenotypes. Here, we performed RNA sequencing on four brain regions from control animals and those susceptible or resilient to chronic social defeat stress at multiple time points. We employed an integrative network biology approach to identify transcriptional networks and key driver genes that regulate susceptibility to depressive-like symptoms. Further, we validated in vivo several key drivers and their associated transcriptional networks that regulate depression susceptibility and confirmed their functional significance at the levels of gene transcription, synaptic regulation, and behavior. Our study reveals novel transcriptional networks that control stress susceptibility and offers fundamentally new leads for antidepressant drug discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  17. An upstream activation element exerting differential transcriptional activation on an archaeal promoter

    DEFF Research Database (Denmark)

    Peng, Nan; Xia, Qiu; Chen, Zhengjun

    2009-01-01

    S gene encoding an arabinose binding protein was characterized using an Sulfolobus islandicus reporter gene system. The minimal active araS promoter (P(araS)) was found to be 59 nucleotides long and harboured four promoter elements: an ara-box, an upstream transcription factor B-responsive element (BRE......), a TATA-box and a proximal promoter element, each of which contained important nucleotides that either greatly decreased or completely abolished promoter activity upon mutagenesis. The basal araS promoter was virtually inactive due to intrinsically weak BRE element, and the upstream activating sequence...... (UAS) ara-box activated the basal promoter by recruiting transcription factor B to its BRE. While this UAS ensured a general expression from an inactive or weak basal promoter in the presence of other tested carbon resources, it exhibited a strong arabinose-responsive transcriptional activation. To our...

  18. Transcriptional analysis of phloem-associated cells of potato.

    Science.gov (United States)

    Lin, Tian; Lashbrook, Coralie C; Cho, Sung Ki; Butler, Nathaniel M; Sharma, Pooja; Muppirala, Usha; Severin, Andrew J; Hannapel, David J

    2015-09-03

    Numerous signal molecules, including proteins and mRNAs, are transported through the architecture of plants via the vascular system. As the connection between leaves and other organs, the petiole and stem are especially important in their transport function, which is carried out by the phloem and xylem, especially by the sieve elements in the phloem system. The phloem is an important conduit for transporting photosynthate and signal molecules like metabolites, proteins, small RNAs, and full-length mRNAs. Phloem sap has been used as an unadulterated source to profile phloem proteins and RNAs, but unfortunately, pure phloem sap cannot be obtained in most plant species. Here we make use of laser capture microdissection (LCM) and RNA-seq for an in-depth transcriptional profile of phloem-associated cells of both petioles and stems of potato. To expedite our analysis, we have taken advantage of the potato genome that has recently been fully sequenced and annotated. Out of the 27 k transcripts assembled that we identified, approximately 15 k were present in phloem-associated cells of petiole and stem with greater than ten reads. Among these genes, roughly 10 k are affected by photoperiod. Several RNAs from this day length-regulated group are also abundant in phloem cells of petioles and encode for proteins involved in signaling or transcriptional control. Approximately 22 % of the transcripts in phloem cells contained at least one binding motif for Pumilio, Nova, or polypyrimidine tract-binding proteins in their downstream sequences. Highlighting the predominance of binding processes identified in the gene ontology analysis of active genes from phloem cells, 78 % of the 464 RNA-binding proteins present in the potato genome were detected in our phloem transcriptome. As a reasonable alternative when phloem sap collection is not possible, LCM can be used to isolate RNA from specific cell types, and along with RNA-seq, provides practical access to expression profiles of

  19. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  20. The WRKY Transcription Factor Genes in Lotus japonicus

    OpenAIRE

    Song, Hui; Wang, Pengfei; Nan, Zhibiao; Wang, Xingjun

    2014-01-01

    WRKY transcription factor genes play critical roles in plant growth and development, as well as stress responses. WRKY genes have been examined in various higher plants, but they have not been characterized in Lotus japonicus. The recent release of the L. japonicus whole genome sequence provides an opportunity for a genome wide analysis of WRKY genes in this species. In this study, we identified 61 WRKY genes in the L. japonicus genome. Based on the WRKY protein structure, L. japonicus WRKY (...

  1. Long-range transcriptional control of an operon necessary for virulence-critical ESX-1 secretion in Mycobacterium tuberculosis.

    Science.gov (United States)

    Hunt, Debbie M; Sweeney, Nathan P; Mori, Luisa; Whalan, Rachael H; Comas, Iñaki; Norman, Laura; Cortes, Teresa; Arnvig, Kristine B; Davis, Elaine O; Stapleton, Melanie R; Green, Jeffrey; Buxton, Roger S

    2012-05-01

    The ESX-1 secretion system of Mycobacterium tuberculosis has to be precisely regulated since the secreted proteins, although required for a successful virulent infection, are highly antigenic and their continued secretion would alert the immune system to the infection. The transcription of a five-gene operon containing espACD-Rv3613c-Rv3612c, which is required for ESX-1 secretion and is essential for virulence, was shown to be positively regulated by the EspR transcription factor. Thus, transcription from the start site, found to be located 67 bp upstream of espA, was dependent upon EspR enhancer-like sequences far upstream (between 884 and 1,004 bp), which we term the espA activating region (EAR). The EAR contains one of the known binding sites for EspR, providing the first in vivo evidence that transcriptional activation at the espA promoter occurs by EspR binding to the EAR and looping out DNA between this site and the promoter. Regulation of transcription of this operon thus takes place over long regions of the chromosome. This regulation may differ in some members of the M. tuberculosis complex, including Mycobacterium bovis, since deletions of the intergenic region have removed the upstream sequence containing the EAR, resulting in lowered espA expression. Consequent differences in expression of ESX-1 in these bacteria may contribute to their various pathologies and host ranges. The virulence-critical nature of this operon means that transcription factors controlling its expression are possible drug targets.

  2. DELLA-induced early transcriptional changes during etiolated development in Arabidopsis thaliana.

    Directory of Open Access Journals (Sweden)

    Javier Gallego-Bartolomé

    Full Text Available The hormones gibberellins (GAs control a wide variety of processes in plants, including stress and developmental responses. This task largely relies on the activity of the DELLA proteins, nuclear-localized transcriptional regulators that do not seem to have DNA binding capacity. The identification of early target genes of DELLA action is key not only to understand how GAs regulate physiological responses, but also to get clues about the molecular mechanisms by which DELLAs regulate gene expression. Here, we have investigated the global, early transcriptional response triggered by the Arabidopsis DELLA protein GAI during skotomorphogenesis, a developmental program tightly regulated by GAs. Our results show that the induction of GAI activity has an almost immediate effect on gene expression. Although this transcriptional regulation is largely mediated by the PIFs and HY5 transcription factors based on target meta-analysis, additional evidence points to other transcription factors that would be directly involved in DELLA regulation of gene expression. First, we have identified cis elements recognized by Dofs and type-B ARRs among the sequences enriched in the promoters of GAI targets; and second, an enrichment in additional cis elements appeared when this analysis was extended to a dataset of early targets of the DELLA protein RGA: CArG boxes, bound by MADS-box proteins, and the E-box CACATG that links the activity of DELLAs to circadian transcriptional regulation. Finally, Gene Ontology analysis highlights the impact of DELLA regulation upon the homeostasis of the GA, auxin, and ethylene pathways, as well as upon pre-existing transcriptional networks.

  3. Reconstruction of the core and extended regulons of global transcription factors.

    Directory of Open Access Journals (Sweden)

    Yann S Dufour

    2010-07-01

    Full Text Available The processes underlying the evolution of regulatory networks are unclear. To address this question, we used a comparative genomics approach that takes advantage of the large number of sequenced bacterial genomes to predict conserved and variable members of transcriptional regulatory networks across phylogenetically related organisms. Specifically, we developed a computational method to predict the conserved regulons of transcription factors across alpha-proteobacteria. We focused on the CRP/FNR super-family of transcription factors because it contains several well-characterized members, such as FNR, FixK, and DNR. While FNR, FixK, and DNR are each proposed to regulate different aspects of anaerobic metabolism, they are predicted to recognize very similar DNA target sequences, and they occur in various combinations among individual alpha-proteobacterial species. In this study, the composition of the respective FNR, FixK, or DNR conserved regulons across 87 alpha-proteobacterial species was predicted by comparing the phylogenetic profiles of the regulators with the profiles of putative target genes. The utility of our predictions was evaluated by experimentally characterizing the FnrL regulon (a FNR-type regulator in the alpha-proteobacterium Rhodobacter sphaeroides. Our results show that this approach correctly predicted many regulon members, provided new insights into the biological functions of the respective regulons for these regulators, and suggested models for the evolution of the corresponding transcriptional networks. Our findings also predict that, at least for the FNR-type regulators, there is a core set of target genes conserved across many species. In addition, the members of the so-called extended regulons for the FNR-type regulators vary even among closely related species, possibly reflecting species-specific adaptation to environmental and other factors. The comparative genomics approach we developed is readily applicable to other

  4. AKT phosphorylates H3-threonine 45 to facilitate termination of gene transcription in response to DNA damage

    OpenAIRE

    Lee, Jong-Hyuk; Kang, Byung-Hee; Jang, Hyonchol; Kim, Tae Wan; Choi, Jinmi; Kwak, Sojung; Han, Jungwon; Cho, Eun-Jung; Youn, Hong-Duk

    2015-01-01

    Post-translational modifications of core histones affect various cellular processes, primarily through transcription. However, their relationship with the termination of transcription has remained largely unknown. In this study, we show that DNA damage-activated AKT phosphorylates threonine 45 of core histone H3 (H3-T45). By genome-wide chromatin immunoprecipitation sequencing (ChIP-seq) analysis, H3-T45 phosphorylation was distributed throughout DNA damage-responsive gene loci, particularly ...

  5. Global transcriptional landscape and promoter mapping of the gut commensal Bifidobacterium breve UCC2003.

    Science.gov (United States)

    Bottacini, Francesca; Zomer, Aldert; Milani, Christian; Ferrario, Chiara; Lugli, Gabriele Andrea; Egan, Muireann; Ventura, Marco; van Sinderen, Douwe

    2017-12-28

    Bifidobacterium breve represents a common member of the infant gut microbiota and its presence in the gut has been associated with host well being. For this reason it is relevant to investigate and understand the molecular mechanisms underlying the establishment, persistence and activities of this gut commensal in the host environment. The assessment of vegetative promoters in the bifidobacterial prototype Bifidobacterium breve UCC2003 was performed employing a combination of RNA tiling array analysis and cDNA sequencing. Canonical -10 (TATAAT) and -35 (TTGACA) sequences were identified upstream of transcribed genes or operons, where deviations from this consensus correspond to transcription level variations. A Random Forest analysis assigned the -10 region of B. breve promoters as the element most impacting on the level of transcription, followed by the spacer length and the 5'-UTR length of transcripts. Furthermore, our transcriptome study also identified rho-independent termination as the most common and effective termination signal of highly and moderately transcribed operons in B. breve. The present study allowed us to identify genes and operons that are actively transcribed in this organism during logarithmic growth, and link promoter elements with levels of transcription of essential genes in this organism. As homologs of many of our identified genes are present across the whole genus Bifidobacterium, our dataset constitutes a transcriptomic reference to be used for future investigations of gene expression in members of this genus.

  6. Systematic identification of cis-regulatory sequences active in mouse and human embryonic stem cells.

    Directory of Open Access Journals (Sweden)

    Marica Grskovic

    2007-08-01

    Full Text Available Understanding the transcriptional regulation of pluripotent cells is of fundamental interest and will greatly inform efforts aimed at directing differentiation of embryonic stem (ES cells or reprogramming somatic cells. We first analyzed the transcriptional profiles of mouse ES cells and primordial germ cells and identified genes upregulated in pluripotent cells both in vitro and in vivo. These genes are enriched for roles in transcription, chromatin remodeling, cell cycle, and DNA repair. We developed a novel computational algorithm, CompMoby, which combines analyses of sequences both aligned and non-aligned between different genomes with a probabilistic segmentation model to systematically predict short DNA motifs that regulate gene expression. CompMoby was used to identify conserved overrepresented motifs in genes upregulated in pluripotent cells. We show that the motifs are preferentially active in undifferentiated mouse ES and embryonic germ cells in a sequence-specific manner, and that they can act as enhancers in the context of an endogenous promoter. Importantly, the activity of the motifs is conserved in human ES cells. We further show that the transcription factor NF-Y specifically binds to one of the motifs, is differentially expressed during ES cell differentiation, and is required for ES cell proliferation. This study provides novel insights into the transcriptional regulatory networks of pluripotent cells. Our results suggest that this systematic approach can be broadly applied to understanding transcriptional networks in mammalian species.

  7. Bacillus anthracis genome organization in light of whole transcriptome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.; Bergman, Nicholas; Borodovsky, Mark

    2010-03-22

    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computational predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.

  8. Computational prediction of miRNA genes from small RNA sequencing data

    Directory of Open Access Journals (Sweden)

    Wenjing eKang

    2015-01-01

    Full Text Available Next-generation sequencing now for the first time allows researchers to gauge the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. miRNAs are 22 nucleotide small RNAs (sRNAs that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq, which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field.

  9. A combinatorial approach to synthetic transcription factor-promoter combinations for yeast strain engineering

    DEFF Research Database (Denmark)

    Dossani, Zain Y.; Apel, Amanda Reider; Szmidt-Middleton, Heather

    2018-01-01

    regions, we have built a library of hybrid promoters that are regulated by a synthetic transcription factor. The hybrid promoters consist of native S. cerevisiae promoters, in which the operator regions have been replaced with sequences that are recognized by the bacterial LexA DNA binding protein....... Correspondingly, the synthetic transcription factor (TF) consists of the DNA binding domain of the LexA protein, fused with the human estrogen binding domain and the viral activator domain, VP16. The resulting system with a bacterial DNA binding domain avoids the transcription of native S. cerevisiae genes...... levels, using the same synthetic TF and a given estradiol. This set of promoters, in combination with our synthetic TF, has the potential to regulate numerous genes or pathways simultaneously, to multiple desired levels, in a single strain....

  10. Analysis of prostate-specific antigen transcripts in chimpanzees, cynomolgus monkeys, baboons, and African green monkeys.

    Directory of Open Access Journals (Sweden)

    James N Mubiru

    Full Text Available The function of prostate-specific antigen (PSA is to liquefy the semen coagulum so that the released sperm can fuse with the ovum. Fifteen spliced variants of the PSA gene have been reported in humans, but little is known about alternative splicing in nonhuman primates. Positive selection has been reported in sex- and reproductive-related genes from sea urchins to Drosophila to humans; however, there are few studies of adaptive evolution of the PSA gene. Here, using polymerase chain reaction (PCR product cloning and sequencing, we study PSA transcript variant heterogeneity in the prostates of chimpanzees (Pan troglodytes, cynomolgus monkeys (Macaca fascicularis, baboons (Papio hamadryas anubis, and African green monkeys (Chlorocebus aethiops. Six PSA variants were identified in the chimpanzee prostate, but only two variants were found in cynomolgus monkeys, baboons, and African green monkeys. In the chimpanzee the full-length transcript is expressed at the same magnitude as the transcripts that retain intron 3. We have found previously unidentified splice variants of the PSA gene, some of which might be linked to disease conditions. Selection on the PSA gene was studied in 11 primate species by computational methods using the sequences reported here for African green monkey, cynomolgus monkey, baboon, and chimpanzee and other sequences available in public databases. A codon-based analysis (dN/dS of the PSA gene identified potential adaptive evolution at five residue sites (Arg45, Lys70, Gln144, Pro189, and Thr203.

  11. Transcription initiation patterns indicate divergent strategies for gene regulation at the chromatin level.

    Directory of Open Access Journals (Sweden)

    Elizabeth A Rach

    2011-01-01

    Full Text Available The application of deep sequencing to map 5' capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: "focused" promoters with transcription start sites (TSSs that occur in a narrowly defined genomic span and "dispersed" promoters with TSSs that are spread over a larger window. Previous studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, but virtually no studies have directly investigated the relationship with chromatin features. Here, we show that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Dispersed promoters display higher associations with well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone variants (H2A.Z and marks (H3K4 methylation, as well as insulator binding (such as CTCF, independent of the expression levels of affected genes. Notably, differences are conserved across mammals and flies, and they provide for a clearer separation of promoter architectures than the presence and absence of CpG islands or the occurrence of stalled RNA polymerase. Computational models support the stronger contribution of chromatin features to the definition of dispersed promoters compared to focused start sites. Our results show that promoter classes defined from 5' capped transcripts not only reflect differences in the initiation process at the core promoter but also are indicative of divergent transcriptional programs established within gene-proximal nucleosome organization.

  12. Raalin, a transcript enriched in the honey bee brain, is a remnant of genomic rearrangement in Hymenoptera.

    Science.gov (United States)

    Tirosh, Y; Morpurgo, N; Cohen, M; Linial, M; Bloch, G

    2012-06-01

    We identified a predicted compact cysteine-rich sequence in the honey bee genome that we called 'Raalin'. Raalin transcripts are enriched in the brain of adult honey bee workers and drones, with only minimum expression in other tissues or in pre-adult stages. Open-reading frame (ORF) homologues of Raalin were identified in the transcriptomes of fruit flies, mosquitoes and moths. The Raalin-like gene from Drosophila melanogaster encodes for a short secreted protein that is maximally expressed in the adult brain with negligible expression in other tissues or pre-imaginal stages. Raalin-like sequences have also been found in the recently sequenced genomes of six ant species, but not in the jewel wasp Nasonia vitripennis. As in the honey bee, the Raalin-like sequences of ants do not have an ORF. A comparison of the genome region containing Raalin in the genomes of bees, ants and the wasp provides evolutionary support for an extensive genome rearrangement in this sequence. Our analyses identify a new family of ancient cysteine-rich short sequences in insects in which insertions and genome rearrangements may have disrupted this locus in the branch leading to the Hymenoptera. The regulated expression of this transcript suggests that it has a brain-specific function. © 2012 The Authors. Insect Molecular Biology © 2012 The Royal Entomological Society.

  13. The Mediator complex and transcription regulation

    Science.gov (United States)

    Poss, Zachary C.; Ebmeier, Christopher C.

    2013-01-01

    The Mediator complex is a multi-subunit assembly that appears to be required for regulating expression of most RNA polymerase II (pol II) transcripts, which include protein-coding and most non-coding RNA genes. Mediator and pol II function within the pre-initiation complex (PIC), which consists of Mediator, pol II, TFIIA, TFIIB, TFIID, TFIIE, TFIIF and TFIIH and is approximately 4.0 MDa in size. Mediator serves as a central scaffold within the PIC and helps regulate pol II activity in ways that remain poorly understood. Mediator is also generally targeted by sequence-specific, DNA-binding transcription factors (TFs) that work to control gene expression programs in response to developmental or environmental cues. At a basic level, Mediator functions by relaying signals from TFs directly to the pol II enzyme, thereby facilitating TF-dependent regulation of gene expression. Thus, Mediator is essential for converting biological inputs (communicated by TFs) to physiological responses (via changes in gene expression). In this review, we summarize an expansive body of research on the Mediator complex, with an emphasis on yeast and mammalian complexes. We focus on the basics that underlie Mediator function, such as its structure and subunit composition, and describe its broad regulatory influence on gene expression, ranging from chromatin architecture to transcription initiation and elongation, to mRNA processing. We also describe factors that influence Mediator structure and activity, including TFs, non-coding RNAs and the CDK8 module. PMID:24088064

  14. cis sequence effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  15. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    Science.gov (United States)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  16. Sequence-selective targeting of duplex DNA by peptide nucleic acids

    DEFF Research Database (Denmark)

    Nielsen, Peter E

    2010-01-01

    Sequence-selective gene targeting constitutes an attractive drug-discovery approach for genetic therapy, with the aim of reducing or enhancing the activity of specific genes at the transcriptional level, or as part of a methodology for targeted gene repair. The pseudopeptide DNA mimic peptide...

  17. Interferon-induced transcription of a gene encoding a 15-kDA protein depends on an upstream enhancer element

    International Nuclear Information System (INIS)

    Reich, N.; Evans, B.; Levy, D.; Fahey, D.; Knight, E. Jr.; Darnell, J.E. Jr.

    1987-01-01

    A human gene encoding an interferon-induced 15-kDa protein has been isolated from a genomic library. The gene appears to be single-copy and is composed of two exons, the first of which contains the ATG translation initiation codon. In vitro nuclear run-on assays showed that the transcription rate of the gene is stimulated after interferon treatment. To analyze transcriptional regulatory sequences, the authors constructed recombinant plasmids for use in transient transfection assays of HeLa cells. Constructs containing 115 nucleotides 5' to the transcription initiation site were found to be fully inducible by interferon. Assays of deletion mutants identified a critical element for interferon induction located between -115 and -96, just upstream of the CCAAT box. Moreover, a DNA fragment including this region can confer interferon inducibility on a heterologous promoter (thymidine kinase) when cloned in either orientation upstream of the gene or downstream of the gene. These are properties characteristic of an enhancer element that is active only after treatment with interferon. This regulatory sequence may be shared by a group of interferon-induced genes, since a very similar sequence is present within the functional region near the RNA start site of another interferon-induced gene

  18. Survey and evaluation of mutations in the human KLF1 transcription unit.

    Science.gov (United States)

    Gnanapragasam, Merlin Nithya; Crispino, John D; Ali, Abdullah M; Weinberg, Rona; Hoffman, Ronald; Raza, Azra; Bieker, James J

    2018-04-26

    Erythroid Krüppel-like Factor (EKLF/KLF1) is an erythroid-enriched transcription factor that plays a global role in all aspects of erythropoiesis, including cell cycle control and differentiation. We queried whether its mutation might play a role in red cell malignancies by genomic sequencing of the KLF1 transcription unit in cell lines, erythroid neoplasms, dysplastic disorders, and leukemia. In addition, we queried published databases from a number of varied sources. In all cases we only found changes in commonly notated SNPs. Our results suggest that if there are mutations in KLF1 associated with erythroid malignancies, they are exceedingly rare.

  19. Analysis of transcript and protein overlap in a human osteosarcoma cell line

    Directory of Open Access Journals (Sweden)

    Emanuelsson Olof

    2010-12-01

    Full Text Available Abstract Background An interesting field of research in genomics and proteomics is to compare the overlap between the transcriptome and the proteome. Recently, the tools to analyse gene and protein expression on a whole-genome scale have been improved, including the availability of the new generation sequencing instruments and high-throughput antibody-based methods to analyze the presence and localization of proteins. In this study, we used massive transcriptome sequencing (RNA-seq to investigate the transcriptome of a human osteosarcoma cell line and compared the expression levels with in situ protein data obtained in-situ from antibody-based immunohistochemistry (IHC and immunofluorescence microscopy (IF. Results A large-scale analysis based on 2749 genes was performed, corresponding to approximately 13% of the protein coding genes in the human genome. We found the presence of both RNA and proteins to a large fraction of the analyzed genes with 60% of the analyzed human genes detected by all three methods. Only 34 genes (1.2% were not detected on the transcriptional or protein level with any method. Our data suggest that the majority of the human genes are expressed at detectable transcript or protein levels in this cell line. Since the reliability of antibodies depends on possible cross-reactivity, we compared the RNA and protein data using antibodies with different reliability scores based on various criteria, including Western blot analysis. Gene products detected in all three platforms generally have good antibody validation scores, while those detected only by antibodies, but not by RNA sequencing, generally consist of more low-scoring antibodies. Conclusion This suggests that some antibodies are staining the cells in an unspecific manner, and that assessment of transcript presence by RNA-seq can provide guidance for validation of the corresponding antibodies.

  20. G =  MAT: linking transcription factor expression and DNA binding data.

    Directory of Open Access Journals (Sweden)

    Konstantin Tretyakov

    Full Text Available Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/.

  1. The transcriptional programme of the androgen receptor (AR) in prostate cancer.

    Science.gov (United States)

    Lamb, Alastair D; Massie, Charlie E; Neal, David E

    2014-03-01

    The androgen receptor (AR) is essential for normal prostate and prostate cancer cell growth. AR transcriptional activity is almost always maintained even in hormone relapsed prostate cancer (HRPC) in the absence of normal levels of circulating testosterone. Current molecular techniques, such as chromatin-immunoprecipitation sequencing (ChIP-seq), have permitted identification of direct AR-binding sites in cell lines and human tissue with a distinct coordinate network evident in HRPC. The effectiveness of novel agents, such as abiraterone acetate (suppresses adrenal androgens) or enzalutamide (MDV3100, potent AR antagonist), in treating advanced prostate cancer underlines the on-going critical role of the AR throughout all stages of the disease. Persistent AR activity in advanced disease regulates cell cycle activity, steroid biosynthesis and anabolic metabolism in conjunction with regulatory co-factors, such as the E2F family, c-Myc and signal transducer and activator of transcription (STAT) transcription factors. Further treatment approaches must target these other factors. © 2013 The Authors. BJU International © 2013 BJU International.

  2. The prophages of Lactobacillus johnsonii NCC 533: comparative genomics and transcription analysis

    International Nuclear Information System (INIS)

    Ventura, Marco; Canchaya, Carlos; Pridmore, R. David; Bruessow, Harald

    2004-01-01

    Two non-inducible, but apparently complete prophages were identified in the genome of the sequenced Lactobacillus johnsonii strain NCC 533. The 38- and 40-kb-long prophages Lj928 and Lj965 represent distinct lineages of Sfi11-like pac-site Siphoviridae unrelated at the DNA sequence level. The deduced structural proteins from Lj928 demonstrated aa sequence identity with Lactococcus lactis phage TP901-1, while Lj965 shared sequence links with Streptococcus thermophilus phage O1205. With the exception of tRNA genes, inserted between DNA replication and DNA packaging genes, the transcription of the prophage was restricted to the genome segments near both attachment sites. Transcribed genes unrelated to phage functions were inserted between the phage repressor and integrase genes; one group of genes shared sequence relatedness with a mobile DNA element in Staphylococcus aureus. A short, but highly transcribed region was located between the phage lysin and right attachment site; it lacked a protein-encoding function in one prophage

  3. Role of the hinge region of glucocorticoid receptor for HEXIM1-mediated transcriptional repression

    International Nuclear Information System (INIS)

    Yoshikawa, Noritada; Shimizu, Noriaki; Sano, Motoaki; Ohnuma, Kei; Iwata, Satoshi; Hosono, Osamu; Fukuda, Keiichi; Morimoto, Chikao

    2008-01-01

    We previously reported that HEXIM1 (hexamethylene bisacetamide-inducible protein 1), which suppresses transcription elongation via sequestration of positive transcription elongation factor b (P-TEFb) using 7SK RNA as a scaffold, directly associates with glucocorticoid receptor (GR) to suppress glucocorticoid-inducible gene activation. Here, we revealed that the hinge region of GR is essential for its interaction with HEXIM1, and that oxosteroid receptors including GR show sequence homology in their hinge region and interact with HEXIM1, whereas the other members of nuclear receptors do not. We also showed that HEXIM1 suppresses GR-mediated transcription in two ways: sequestration of P-TEFb by HEXIM1 and direct interaction between GR and HEXIM1. In contrast, peroxisome proliferator-activated receptor γ-dependent gene expression is negatively modulated by HEXIM1 solely via sequestration of P-TEFb. We, therefore, conclude that HEXIM1 may act as a gene-selective transcriptional regulator via direct interaction with certain transcriptional regulators including GR and contribute to fine-tuning of, for example, glucocorticoid-mediated biological responses

  4. Specificity and transcriptional activity of microbiota associated with low and high microbial abundance sponges from the Red Sea

    KAUST Repository

    Moitinho-Silva, Lucas

    2013-08-20

    Marine sponges are generally classified as high microbial abundance (HMA) and low microbial abundance (LMA) species. Here, 16S rRNA amplicon sequencing was applied to investigate the diversity, specificity and transcriptional activity of microbes associated with an LMA sponge (Stylissa carteri), an HMA sponge (Xestospongia testudinaria) and sea water collected from the central Saudi Arabia coast of the Red Sea. Altogether, 887 068 denoised sequences were obtained, of which 806 661 sequences remained after quality control. This resulted in 1477 operational taxonomic units (OTUs) that were assigned to 27 microbial phyla. The microbial composition of S. carteri was more similar to that of sea water than to that of X. testudinaria, which is consistent with the observation that the sequence data set of S. carteri contained many more possibly sea water sequences (~24%) than the X. testudinaria data set (~6%). The most abundant OTUs were shared between all three sources (S. carteri, X. testudinaria, sea water), while rare OTUs were unique to any given source. Despite this high degree of overlap, each sponge species contained its own specific microbiota. The X. testudinaria-specific bacterial taxa were similar to those already described for this species. A set of S. carteri-specific bacterial taxa related to Proteobacteria and Nitrospira was identified, which are likely permanently associated with S. carteri. The transcriptional activity of sponge-associated microorganisms correlated well with their abundance. Quantitative PCR revealed the presence of Poribacteria, representing typical sponge symbionts, in both sponge species and in sea water; however, low transcriptional activity in sea water suggested that Poribacteria are not active outside the host context. © 2013 John Wiley & Sons Ltd.

  5. Transcriptional regulation by competing transcription factor modules.

    Directory of Open Access Journals (Sweden)

    Rutger Hermsen

    2006-12-01

    Full Text Available Gene regulatory networks lie at the heart of cellular computation. In these networks, intracellular and extracellular signals are integrated by transcription factors, which control the expression of transcription units by binding to cis-regulatory regions on the DNA. The designs of both eukaryotic and prokaryotic cis-regulatory regions are usually highly complex. They frequently consist of both repetitive and overlapping transcription factor binding sites. To unravel the design principles of these promoter architectures, we have designed in silico prokaryotic transcriptional logic gates with predefined input-output relations using an evolutionary algorithm. The resulting cis-regulatory designs are often composed of modules that consist of tandem arrays of binding sites to which the transcription factors bind cooperatively. Moreover, these modules often overlap with each other, leading to competition between them. Our analysis thus identifies a new signal integration motif that is based upon the interplay between intramodular cooperativity and intermodular competition. We show that this signal integration mechanism drastically enhances the capacity of cis-regulatory domains to integrate signals. Our results provide a possible explanation for the complexity of promoter architectures and could be used for the rational design of synthetic gene circuits.

  6. VirF-Independent Regulation of Shigella virB Transcription is Mediated by the Small RNA RyhB

    Science.gov (United States)

    Broach, William H.; Egan, Nicholas; Wing, Helen J.; Payne, Shelley M.; Murphy, Erin R.

    2012-01-01

    Infection of the human host by Shigella species requires the coordinated production of specific Shigella virulence factors, a process mediated largely by the VirF/VirB regulatory cascade. VirF promotes the transcription of virB, a gene encoding the transcriptional activator of several virulence-associated genes. This study reveals that transcription of virB is also regulated by the small RNA RyhB, and importantly, that this regulation is not achieved indirectly via modulation of VirF activity. These data are the first to demonstrate that the regulation of virB transcription can be uncoupled from the master regulator VirF. It is also established that efficient RyhB-dependent regulation of transcription is facilitated by specific nucleic acid sequences within virB. This study not only reveals RyhB-dependent regulation of virB transcription as a novel point of control in the central regulatory circuit modulating Shigella virulence, but also highlights the versatility of RyhB in controlling bacterial gene expression. PMID:22701677

  7. Identification of unannotated exons of low abundance transcripts in Drosophila melanogaster and cloning of a new serine protease gene upregulated upon injury

    Directory of Open Access Journals (Sweden)

    Monesi Nadia

    2007-07-01

    Full Text Available Abstract Background The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000 indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome. Results Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3. Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50% new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show

  8. Transcription and chromatin determinants of de novo DNA methylation timing in oocytes.

    Science.gov (United States)

    Gahurova, Lenka; Tomizawa, Shin-Ichi; Smallwood, Sébastien A; Stewart-Morgan, Kathleen R; Saadeh, Heba; Kim, Jeesun; Andrews, Simon R; Chen, Taiping; Kelsey, Gavin

    2017-01-01

    Gametogenesis in mammals entails profound re-patterning of the epigenome. In the female germline, DNA methylation is acquired late in oogenesis from an essentially unmethylated baseline and is established largely as a consequence of transcription events. Molecular and functional studies have shown that imprinted genes become methylated at different times during oocyte growth; however, little is known about the kinetics of methylation gain genome wide and the reasons for asynchrony in methylation at imprinted loci. Given the predominant role of transcription, we sought to investigate whether transcription timing is rate limiting for de novo methylation and determines the asynchrony of methylation events. Therefore, we generated genome-wide methylation and transcriptome maps of size-selected, growing oocytes to capture the onset and progression of methylation. We find that most sequence elements, including most classes of transposable elements, acquire methylation at similar rates overall. However, methylation of CpG islands (CGIs) is delayed compared with the genome average and there are reproducible differences amongst CGIs in onset of methylation. Although more highly transcribed genes acquire methylation earlier, the major transitions in the oocyte transcriptome occur well before the de novo methylation phase, indicating that transcription is generally not rate limiting in conferring permissiveness to DNA methylation. Instead, CGI methylation timing negatively correlates with enrichment for histone 3 lysine 4 (H3K4) methylation and dependence on the H3K4 demethylases KDM1A and KDM1B, implicating chromatin remodelling as a major determinant of methylation timing. We also identified differential enrichment of transcription factor binding motifs in CGIs acquiring methylation early or late in oocyte growth. By combining these parameters into multiple regression models, we were able to account for about a fifth of the variation in methylation timing of CGIs. Finally

  9. TIP48/Reptin and H2A.Z requirement for initiating chromatin remodeling in estrogen-activated transcription.

    Directory of Open Access Journals (Sweden)

    Mathieu Dalvai

    2013-04-01

    Full Text Available Histone variants, including histone H2A.Z, are incorporated into specific genomic sites and participate in transcription regulation. The role of H2A.Z at these sites remains poorly characterized. Our study investigates changes in the chromatin environment at the Cyclin D1 gene (CCND1 during transcriptional initiation in response to estradiol in estrogen receptor positive mammary tumour cells. We show that H2A.Z is present at the transcription start-site and downstream enhancer sequences of CCND1 when the gene is poorly transcribed. Stimulation of CCND1 expression required release of H2A.Z concomitantly from both these DNA elements. The AAA+ family members TIP48/reptin and the histone variant H2A.Z are required to remodel the chromatin environment at CCND1 as a prerequisite for binding of the estrogen receptor (ERα in the presence of hormone. TIP48 promotes acetylation and exchange of H2A.Z, which triggers a dissociation of the CCND1 3' enhancer from the promoter, thereby releasing a repressive intragenic loop. This release then enables the estrogen receptor to bind to the CCND1 promoter. Our findings provide new insight into the priming of chromatin required for transcription factor access to their target sequence. Dynamic release of gene loops could be a rapid means to remodel chromatin and to stimulate transcription in response to hormones.

  10. Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome.

    Science.gov (United States)

    Dresch, Jacqueline M; Zellers, Rowan G; Bork, Daniel K; Drewell, Robert A

    2016-01-01

    A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development.

  11. In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

    Science.gov (United States)

    Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

    2011-01-01

    To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533

  12. Synthesis of in vitro Co1E1 transcripts with 5'-terminal ribonucleotides that exhibit noncomplementarity with the DNA template

    International Nuclear Information System (INIS)

    Parker, R.C.

    1986-01-01

    A region that forms the S1 nuclease site in Co1E1 DNA is shown to code for an in vitro transcript, called S1 RNA-B, which contains a 5'-terminal GTP residue that exhibits noncomplementarity with the template's DNA sequence. The synthesis of S1 RNA-B initiates four bases upstream from the start point for S1 RNA-C. The initial four bases in S1 RNA-B and S1 RNA-C are identical. The relative synthesis of S1 RNA-B to S1 RNA-C is sensitive to the concentration of GTP, a substrate that is required for elongation past the +4 position in S1 RNA-C. Dinucleotides that are expected to only initiate synthesis of S1 RNA-C yield two transcripts that appear to initiate from the S1 RNA-C and S1 RNA-B start sites. In vitro studies involving other Co1E1 transcripts, RNA-B and RNA-C, provide similar observations concerning the noncomplementary initiation phenomenon. A model involving transcriptional slippage is suggested to explain the noncomplementary initiation phenomenon. The model proposes that the cycling reaction of Escherichia coli RNA polymerase produces tetranucleotides that are transposed to nearby upstream sequences for priming transcription

  13. X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila.

    Science.gov (United States)

    Larschan, Erica; Bishop, Eric P; Kharchenko, Peter V; Core, Leighton J; Lis, John T; Park, Peter J; Kuroda, Mitzi I

    2011-03-03

    The evolution of sex chromosomes has resulted in numerous species in which females inherit two X chromosomes but males have a single X, thus requiring dosage compensation. MSL (Male-specific lethal) complex increases transcription on the single X chromosome of Drosophila males to equalize expression of X-linked genes between the sexes. The biochemical mechanisms used for dosage compensation must function over a wide dynamic range of transcription levels and differential expression patterns. It has been proposed that the MSL complex regulates transcriptional elongation to control dosage compensation, a model subsequently supported by mapping of the MSL complex and MSL-dependent histone 4 lysine 16 acetylation to the bodies of X-linked genes in males, with a bias towards 3' ends. However, experimental analysis of MSL function at the mechanistic level has been challenging owing to the small magnitude of the chromosome-wide effect and the lack of an in vitro system for biochemical analysis. Here we use global run-on sequencing (GRO-seq) to examine the specific effect of the MSL complex on RNA Polymerase II (RNAP II) on a genome-wide level. Results indicate that the MSL complex enhances transcription by facilitating the progression of RNAP II across the bodies of active X-linked genes. Improving transcriptional output downstream of typical gene-specific controls may explain how dosage compensation can be imposed on the diverse set of genes along an entire chromosome.

  14. Brownian dynamics simulations of sequence-dependent duplex denaturation in dynamically superhelical DNA

    Science.gov (United States)

    Mielke, Steven P.; Grønbech-Jensen, Niels; Krishnan, V. V.; Fink, William H.; Benham, Craig J.

    2005-09-01

    The topological state of DNA in vivo is dynamically regulated by a number of processes that involve interactions with bound proteins. In one such process, the tracking of RNA polymerase along the double helix during transcription, restriction of rotational motion of the polymerase and associated structures, generates waves of overtwist downstream and undertwist upstream from the site of transcription. The resulting superhelical stress is often sufficient to drive double-stranded DNA into a denatured state at locations such as promoters and origins of replication, where sequence-specific duplex opening is a prerequisite for biological function. In this way, transcription and other events that actively supercoil the DNA provide a mechanism for dynamically coupling genetic activity with regulatory and other cellular processes. Although computer modeling has provided insight into the equilibrium dynamics of DNA supercoiling, to date no model has appeared for simulating sequence-dependent DNA strand separation under the nonequilibrium conditions imposed by the dynamic introduction of torsional stress. Here, we introduce such a model and present results from an initial set of computer simulations in which the sequences of dynamically superhelical, 147 base pair DNA circles were systematically altered in order to probe the accuracy with which the model can predict location, extent, and time of stress-induced duplex denaturation. The results agree both with well-tested statistical mechanical calculations and with available experimental information. Additionally, we find that sites susceptible to denaturation show a propensity for localizing to supercoil apices, suggesting that base sequence determines locations of strand separation not only through the energetics of interstrand interactions, but also by influencing the geometry of supercoiling.

  15. Nuclear factor ETF specifically stimulates transcription from promoters without a TATA box.

    Science.gov (United States)

    Kageyama, R; Merlino, G T; Pastan, I

    1989-09-15

    Transcription factor ETF stimulates the expression of the epidermal growth factor receptor (EGFR) gene which does not have a TATA box in the promoter region. Here, we show that ETF recognizes various GC-rich sequences including stretches of deoxycytidine or deoxyguanosine residues and GC boxes with similar affinities. ETF also binds to TATA boxes but with a lower affinity. ETF stimulated in vitro transcription from several promoters without TATA boxes but had little or no effect on TATA box-containing promoters even though they had strong ETF-binding sites. These inactive ETF-binding sites became functional when placed upstream of the EGFR promoter whose own ETF-binding sites were removed. Furthermore, when a TATA box was introduced into the EGFR promoter, the responsiveness to ETF was abolished. These results indicate that ETF is a specific transcription factor for promoters which do not contain TATA elements.

  16. Deep Sequencing Reveals Uncharted Isoform Heterogeneity of the Protein-Coding Transcriptome in Cerebral Ischemia.

    Science.gov (United States)

    Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh

    2018-06-03

    Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.

  17. Transcript structure and domain display: a customizable transcript visualization tool.

    Science.gov (United States)

    Watanabe, Kenneth A; Ma, Kaiwang; Homayouni, Arielle; Rushton, Paul J; Shen, Qingxi J

    2016-07-01

    Transcript Structure and Domain Display (TSDD) is a publicly available, web-based program that provides publication quality images of transcript structures and domains. TSDD is capable of producing transcript structures from GFF/GFF3 and BED files. Alternatively, the GFF files of several model organisms have been pre-loaded so that users only needs to enter the locus IDs of the transcripts to be displayed. Visualization of transcripts provides many benefits to researchers, ranging from evolutionary analysis of DNA-binding domains to predictive function modeling. TSDD is freely available for non-commercial users at http://shenlab.sols.unlv.edu/shenlab/software/TSD/transcript_display.html : jeffery.shen@unlv.nevada.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. An enhanceosome containing the Jun B/Fra-2 heterodimer and the HMG-I(Y) architectural protein controls HPV 18 transcription.

    Science.gov (United States)

    Bouallaga, I; Massicard, S; Yaniv, M; Thierry, F

    2000-11-01

    Recent studies have reported new mechanisms that mediate the transcriptional synergy of strong tissue-specific enhancers, involving the cooperative assembly of higher-order nucleoprotein complexes called enhanceosomes. Here we show that the HPV18 enhancer, which controls the epithelial-specific transcription of the E6 and E7 transforming genes, exhibits characteristic features of these structures. We used deletion experiments to show that a core enhancer element cooperates, in a specific helical phasing, with distant essential factors binding to the ends of the enhancer. This core sequence, binding a Jun B/Fra-2 heterodimer, cooperatively recruits the architectural protein HMG-I(Y) in a nucleoprotein complex, where they interact with each other. Therefore, in HeLa cells, HPV18 transcription seems to depend upon the assembly of an enhanceosome containing multiple cellular factors recruited by a core sequence interacting with AP1 and HMG-I(Y).

  19. Construction and characterization of a forward subtracted library of blue mussels Mytilus edulis for the identification of gene transcription signatures and biomarkers of styrene exposure

    International Nuclear Information System (INIS)

    Diaz de Cerio, O.; Hands, E.; Humble, J.; Cajaraville, M.P.; Craft, J.A.; Cancio, I.

    2013-01-01

    Highlights: ► Transcription responses in blue mussels exposed to styrene have been studied by SSH. ► 1440 Clones were obtained from which 287 were sequenced. ► Immune system, cancer-related and ribosomal genes identified as upregulated genes. ► Chitin and β-1-3-glucan metabolism genes highly represented in subtracted library. -- Abstract: Transcriptional profiling can elucidate adaptive/toxicity pathways participating in achieving homeostasis or leading to pathogenesis in marine biota exposed to chemical substances. With the aim of analyzing transcriptional responses in the mussel Mytilus edulis exposed to the corrosive and putatively carcinogenic hydrocarbon styrene (3–5 ppm, 3 days), a forward subtracted (SSH) cDNA library was produced. Female mussels were selected and digestive gland mRNA was isolated. A library with 1440 clones was produced and a total of 287 clones were sequenced, 53% being identified through BlastN analysis against Mytibase and DeepSeaVent databases. Those genes included GO terms such as ‘response to drugs’, ‘immune defense’ and ‘cell proliferation’. Furthermore, sequences related to chitin and beta-1-3-glucan metabolism were also up-regulated by styrene. Many of the obtained sequences could not be annotated constituting new mussel sequences. In conclusion, this SSH study reveals novel sequences useful to generate molecular biomarkers of styrene exposure in mussels

  20. Multiple promoters and alternative splicing: Hoxa5 transcriptional complexity in the mouse embryo.

    Directory of Open Access Journals (Sweden)

    Yan Coulombe

    2010-05-01

    Full Text Available The genomic organization of Hox clusters is fundamental for the precise spatio-temporal regulation and the function of each Hox gene, and hence for correct embryo patterning. Multiple overlapping transcriptional units exist at the Hoxa5 locus reflecting the complexity of Hox clustering: a major form of 1.8 kb corresponding to the two characterized exons of the gene and polyadenylated RNA species of 5.0, 9.5 and 11.0 kb. This transcriptional intricacy raises the question of the involvement of the larger transcripts in Hox function and regulation.We have undertaken the molecular characterization of the Hoxa5 larger transcripts. They initiate from two highly conserved distal promoters, one corresponding to the putative Hoxa6 promoter, and a second located nearby Hoxa7. Alternative splicing is also involved in the generation of the different transcripts. No functional polyadenylation sequence was found at the Hoxa6 locus and all larger transcripts use the polyadenylation site of the Hoxa5 gene. Some larger transcripts are potential Hoxa6/Hoxa5 bicistronic units. However, even though all transcripts could produce the genuine 270 a.a. HOXA5 protein, only the 1.8 kb form is translated into the protein, indicative of its essential role in Hoxa5 gene function. The Hoxa6 mutation disrupts the larger transcripts without major phenotypic impact on axial specification in their expression domain. However, Hoxa5-like skeletal anomalies are observed in Hoxa6 mutants and these defects can be explained by the loss of expression of the 1.8 kb transcript. Our data raise the possibility that the larger transcripts may be involved in Hoxa5 gene regulation.Our observation that the Hoxa5 larger transcripts possess a developmentally-regulated expression combined to the increasing sum of data on the role of long noncoding RNAs in transcriptional regulation suggest that the Hoxa5 larger transcripts may participate in the control of Hox gene expression.