novo protein-coding gene: Topics by WorldWideScience.org

Sample records for novo protein-coding gene

De novo origin of human protein-coding genes.

Directory of Open Access Journals (Sweden)

Dong-Dong Wu

2011-11-01

Full Text Available The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA-seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes.
De Novo Origin of Human Protein-Coding Genes

Science.gov (United States)

Wu, Dong-Dong; Irwin, David M.; Zhang, Ya-Ping

2011-01-01

The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes. PMID:22102831
Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs.

Directory of Open Access Journals (Sweden)

Chen Xie

2012-09-01

Full Text Available Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA-Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis, which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novo protein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novo genes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level.
A human-specific de novo protein-coding gene associated with human brain functions.

Directory of Open Access Journals (Sweden)

Chuan-Yun Li

2010-03-01

Full Text Available To understand whether any human-specific new genes may be associated with human brain functions, we computationally screened the genetic vulnerable factors identified through Genome-Wide Association Studies and linkage analyses of nicotine addiction and found one human-specific de novo protein-coding gene, FLJ33706 (alternative gene symbol C20orf203. Cross-species analysis revealed interesting evolutionary paths of how this gene had originated from noncoding DNA sequences: insertion of repeat elements especially Alu contributed to the formation of the first coding exon and six standard splice junctions on the branch leading to humans and chimpanzees, and two subsequent substitutions in the human lineage escaped two stop codons and created an open reading frame of 194 amino acids. We experimentally verified FLJ33706's mRNA and protein expression in the brain. Real-Time PCR in multiple tissues demonstrated that FLJ33706 was most abundantly expressed in brain. Human polymorphism data suggested that FLJ33706 encodes a protein under purifying selection. A specifically designed antibody detected its protein expression across human cortex, cerebellum and midbrain. Immunohistochemistry study in normal human brain cortex revealed the localization of FLJ33706 protein in neurons. Elevated expressions of FLJ33706 were detected in Alzheimer's brain samples, suggesting the role of this novel gene in human-specific pathogenesis of Alzheimer's disease. FLJ33706 provided the strongest evidence so far that human-specific de novo genes can have protein-coding potential and differential protein expression, and be involved in human brain functions.
De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences.

Directory of Open Access Journals (Sweden)

Josephine A Reinhardt

Full Text Available How non-coding DNA gives rise to new protein-coding genes (de novo genes is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs, while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important.
Foldability of a Natural De Novo Evolved Protein.

Science.gov (United States)

Bungard, Dixie; Copple, Jacob S; Yan, Jing; Chhun, Jimmy J; Kumirov, Vlad K; Foy, Scott G; Masel, Joanna; Wysocki, Vicki H; Cordes, Matthew H J

2017-11-07

The de novo evolution of protein-coding genes from noncoding DNA is emerging as a source of molecular innovation in biology. Studies of random sequence libraries, however, suggest that young de novo proteins will not fold into compact, specific structures typical of native globular proteins. Here we show that Bsc4, a functional, natural de novo protein encoded by a gene that evolved recently from noncoding DNA in the yeast S. cerevisiae, folds to a partially specific three-dimensional structure. Bsc4 forms soluble, compact oligomers with high β sheet content and a hydrophobic core, and undergoes cooperative, reversible denaturation. Bsc4 lacks a specific quaternary state, however, existing instead as a continuous distribution of oligomer sizes, and binds dyes indicative of amyloid oligomers or molten globules. The combination of native-like and non-native-like properties suggests a rudimentary fold that could potentially act as a functional intermediate in the emergence of new folded proteins de novo. Copyright © 2017 Elsevier Ltd. All rights reserved.
A dual origin of the Xist gene from a protein-coding gene and a set of transposable elements.

Directory of Open Access Journals (Sweden)

Eugeny A Elisaphenko

2008-06-01

Full Text Available X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC. Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA.
Origins of De Novo Genes in Human and Chimpanzee.

Science.gov (United States)

Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M Mar

2015-12-01

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
NCYM, a Cis-antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3β resulting in the stabilization of MYCN in human neuroblastomas.

Directory of Open Access Journals (Sweden)

Yusuke Suenaga

2014-01-01

Full Text Available The rearrangement of pre-existing genes has long been thought of as the major mode of new gene generation. Recently, de novo gene birth from non-genic DNA was found to be an alternative mechanism to generate novel protein-coding genes. However, its functional role in human disease remains largely unknown. Here we show that NCYM, a cis-antisense gene of the MYCN oncogene, initially thought to be a large non-coding RNA, encodes a de novo evolved protein regulating the pathogenesis of human cancers, particularly neuroblastoma. The NCYM gene is evolutionally conserved only in the taxonomic group containing humans and chimpanzees. In primary human neuroblastomas, NCYM is 100% co-amplified and co-expressed with MYCN, and NCYM mRNA expression is associated with poor clinical outcome. MYCN directly transactivates both NCYM and MYCN mRNA, whereas NCYM stabilizes MYCN protein by inhibiting the activity of GSK3β, a kinase that promotes MYCN degradation. In contrast to MYCN transgenic mice, neuroblastomas in MYCN/NCYM double transgenic mice were frequently accompanied by distant metastases, behavior reminiscent of human neuroblastomas with MYCN amplification. The NCYM protein also interacts with GSK3β, thereby stabilizing the MYCN protein in the tumors of the MYCN/NCYM double transgenic mice. Thus, these results suggest that GSK3β inhibition by NCYM stabilizes the MYCN protein both in vitro and in vivo. Furthermore, the survival of MYCN transgenic mice bearing neuroblastoma was improved by treatment with NVP-BEZ235, a dual PI3K/mTOR inhibitor shown to destabilize MYCN via GSK3β activation. In contrast, tumors caused in MYCN/NCYM double transgenic mice showed chemo-resistance to the drug. Collectively, our results show that NCYM is the first de novo evolved protein known to act as an oncopromoting factor in human cancer, and suggest that de novo evolved proteins may functionally characterize human disease.
On the Origin of De Novo Genes in Arabidopsis thaliana Populations.

Science.gov (United States)

Li, Zi-Wen; Chen, Xi; Wu, Qiong; Hagmann, Jörg; Han, Ting-Shen; Zou, Yu-Pan; Ge, Song; Guo, Ya-Long

2016-08-03

De novo genes, which originate from ancestral nongenic sequences, are one of the most important sources of protein-coding genes. This origination process is crucial for the adaptation of organisms. However, how de novo genes arise and become fixed in a population or species remains largely unknown. Here, we identified 782 de novo genes from the model plant Arabidopsis thaliana and divided them into three types based on the availability of translational evidence, transcriptional evidence, and neither transcriptional nor translational evidence for their origin. Importantly, by integrating multiple types of omics data, including data from genomes, epigenomes, transcriptomes, and translatomes, we found that epigenetic modifications (DNA methylation and histone modification) play an important role in the origination process of de novo genes. Intriguingly, using the transcriptomes and methylomes from the same population of 84 accessions, we found that de novo genes that are transcribed in approximately half of the total accessions within the population are highly methylated, with lower levels of transcription than those transcribed at other frequencies within the population. We hypothesized that, during the origin of de novo gene alleles, those neutralized to low expression states via DNA methylation have relatively high probabilities of spreading and becoming fixed in a population. Our results highlight the process underlying the origin of de novo genes at the population level, as well as the importance of DNA methylation in this process. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

Science.gov (United States)

Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

2015-12-11

High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates.

Directory of Open Access Journals (Sweden)

Jia-Yu Chen

2015-07-01

Full Text Available While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts.
De novo origin of VCY2 from autosome to Y-transposed amplicon.

Directory of Open Access Journals (Sweden)

Peng-Rong Cao

Full Text Available The formation of new genes is a primary driving force of evolution in all organisms. The de novo evolution of new genes from non-protein-coding genomic regions is emerging as an important additional mechanism for novel gene creation. Y chromosomes underlie sex determination in mammals and contain genes that are required for male-specific functions. In this study, a search was undertaken for Y chromosome de novo genes derived from non-protein-coding sequences. The Y chromosome orphan gene variable charge, Y-linked (VCY2, is an autosome-derived gene that has sequence similarity to large autosomal fragments but lacks an autosomal protein-coding homolog. VCY2 locates in the amplicon containing long DNA fragments that were transposed from autosomes to the Y chromosome before the ape-monkey split. We confirmed that VCY2 cannot be encoded by autosomes due to the presence of multiple disablers that disrupt the open reading frame, such as the absence of start or stop codons and the presence of premature stop codons. Similar observations have been made for homologs in the autosomes of the chimpanzee, gorilla, rhesus macaque, baboon and out-group marmoset, which suggests that there was a non-protein-coding ancestral VCY2 that was common to apes and monkeys that predated the transposition event. Furthermore, while protein-coding orthologs are absent, a putative non-protein-coding VCY2 with conserved disablers was identified in the rhesus macaque Y chromosome male-specific region. This finding implies that VCY2 might have not acquired its protein-coding ability before the ape-monkey split. VCY2 encodes a testis-specific expressed protein and is involved in the pathologic process of male infertility, and the acquisition of this gene might improve male fertility. This is the first evidence that de novo genes can be generated from transposed autosomal non-protein-coding segments, and this evidence provides novel insights into the evolutionary history of the Y
Inheritance-mode specific pathogenicity prioritization (ISPP) for human protein coding genes.

Science.gov (United States)

Hsu, Jacob Shujui; Kwan, Johnny S H; Pan, Zhicheng; Garcia-Barcelo, Maria-Mercè; Sham, Pak Chung; Li, Miaoxin

2016-10-15

Exome sequencing studies have facilitated the detection of causal genetic variants in yet-unsolved Mendelian diseases. However, the identification of disease causal genes among a list of candidates in an exome sequencing study is still not fully settled, and it is often difficult to prioritize candidate genes for follow-up studies. The inheritance mode provides crucial information for understanding Mendelian diseases, but none of the existing gene prioritization tools fully utilize this information. We examined the characteristics of Mendelian disease genes under different inheritance modes. The results suggest that Mendelian disease genes with autosomal dominant (AD) inheritance mode are more haploinsufficiency and de novo mutation sensitive, whereas those autosomal recessive (AR) genes have significantly more non-synonymous variants and regulatory transcript isoforms. In addition, the X-linked (XL) Mendelian disease genes have fewer non-synonymous and synonymous variants. As a result, we derived a new scoring system for prioritizing candidate genes for Mendelian diseases according to the inheritance mode. Our scoring system assigned to each annotated protein-coding gene (N = 18 859) three pathogenic scores according to the inheritance mode (AD, AR and XL). This inheritance mode-specific framework achieved higher accuracy (area under curve = 0.84) in XL mode. The inheritance-mode specific pathogenicity prioritization (ISPP) outperformed other well-known methods including Haploinsufficiency, Recessive, Network centrality, Genic Intolerance, Gene Damage Index and Gene Constraint scores. This systematic study suggests that genes manifesting disease inheritance modes tend to have unique characteristics. ISPP is included in KGGSeq v1.0 (http://grass.cgs.hku.hk/limx/kggseq/), and source code is available from (https://github.com/jacobhsu35/ISPP.git). mxli@hku.hkSupplementary information: Supplementary data are available at Bioinformatics online. © The Author
Promoter Analysis Reveals Globally Differential Regulation of Human Long Non-Coding RNA and Protein-Coding Genes

KAUST Repository

Alam, Tanvir

2014-10-02

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptional regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.
Origins of gene, genetic code, protein and life

Indian Academy of Sciences (India)

Unknown

have concluded that newly-born genes are products of nonstop frames (NSF) ... research to determine tertiary structures of proteins such ... the present earth, is favourable for new genes to arise, if ..... NGG) in the universal genetic code table, cannot satisfy ..... which has been proposed to explain the development of life on.
Genes from scratch--the evolutionary fate of de novo genes.

Science.gov (United States)

Schlötterer, Christian

2015-04-01

Although considered an extremely unlikely event, many genes emerge from previously noncoding genomic regions. This review covers the entire life cycle of such de novo genes. Two competing hypotheses about the process of de novo gene birth are discussed as well as the high death rate of de novo genes. Despite the high death rate, some de novo genes are retained and remain functional, even in distantly related species, through their integration into gene networks. Further studies combining gene expression with ribosome profiling in multiple populations across different species will be instrumental for an improved understanding of the evolutionary processes operating on de novo genes. Copyright © 2015 The Author. Published by Elsevier Ltd.. All rights reserved.
Promoter Analysis Reveals Globally Differential Regulation of Human Long Non-Coding RNA and Protein-Coding Genes

KAUST Repository

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; Brown, James B.; Lipovich, Leonard; Bajic, Vladimir B.

2014-01-01

raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted
Codon usage and expression level of human mitochondrial 13 protein coding genes across six continents.

Science.gov (United States)

Chakraborty, Supriyo; Uddin, Arif; Mazumder, Tarikul Huda; Choudhury, Monisha Nath; Malakar, Arup Kumar; Paul, Prosenjit; Halder, Binata; Deka, Himangshu; Mazumder, Gulshana Akthar; Barbhuiya, Riazul Ahmed; Barbhuiya, Masuk Ahmed; Devi, Warepam Jesmi

2017-12-02

The study of codon usage coupled with phylogenetic analysis is an important tool to understand the genetic and evolutionary relationship of a gene. The 13 protein coding genes of human mitochondria are involved in electron transport chain for the generation of energy currency (ATP). However, no work has yet been reported on the codon usage of the mitochondrial protein coding genes across six continents. To understand the patterns of codon usage in mitochondrial genes across six different continents, we used bioinformatic analyses to analyze the protein coding genes. The codon usage bias was low as revealed from high ENC value. Correlation between codon usage and GC3 suggested that all the codons ending with G/C were positively correlated with GC3 but vice versa for A/T ending codons with the exception of ND4L and ND5 genes. Neutrality plot revealed that for the genes ATP6, COI, COIII, CYB, ND4 and ND4L, natural selection might have played a major role while mutation pressure might have played a dominant role in the codon usage bias of ATP8, COII, ND1, ND2, ND3, ND5 and ND6 genes. Phylogenetic analysis indicated that evolutionary relationships in each of 13 protein coding genes of human mitochondria were different across six continents and further suggested that geographical distance was an important factor for the origin and evolution of 13 protein coding genes of human mitochondria. Copyright © 2017 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.

Science.gov (United States)

Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis

2014-12-01

Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Non-Protein Coding RNAs

CERN Document Server

Walter, Nils G; Batey, Robert T

2009-01-01

This book assembles chapters from experts in the Biophysics of RNA to provide a broadly accessible snapshot of the current status of this rapidly expanding field. The 2006 Nobel Prize in Physiology or Medicine was awarded to the discoverers of RNA interference, highlighting just one example of a large number of non-protein coding RNAs. Because non-protein coding RNAs outnumber protein coding genes in mammals and other higher eukaryotes, it is now thought that the complexity of organisms is correlated with the fraction of their genome that encodes non-protein coding RNAs. Essential biological processes as diverse as cell differentiation, suppression of infecting viruses and parasitic transposons, higher-level organization of eukaryotic chromosomes, and gene expression itself are found to largely be directed by non-protein coding RNAs. The biophysical study of these RNAs employs X-ray crystallography, NMR, ensemble and single molecule fluorescence spectroscopy, optical tweezers, cryo-electron microscopy, and ot...
mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

Science.gov (United States)

Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

2013-08-15

Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
Revisiting the missing protein-coding gene catalog of the domestic dog

Directory of Open Access Journals (Sweden)

Galibert Francis

2009-02-01

Full Text Available Abstract Background Among mammals for which there is a high sequence coverage, the whole genome assembly of the dog is unique in that it predicts a low number of protein-coding genes, ~19,000, compared to the over 20,000 reported for other mammalian species. Of particular interest are the more than 400 of genes annotated in primates and rodent genomes, but missing in dog. Results Using over 14,000 orthologous genes between human, chimpanzee, mouse rat and dog, we built multiple pairwise synteny maps to infer short orthologous intervals that were targeted for characterizing the canine missing genes. Based on gene prediction and a functionality test using the ratio of replacement to silent nucleotide substitution rates (dN/dS, we provide compelling structural and functional evidence for the identification of 232 new protein-coding genes in the canine genome and 69 gene losses, characterized as undetected gene or pseudogenes. Gene loss phyletic pattern analysis using ten species from chicken to human allowed us to characterize 28 canine-specific gene losses that have functional orthologs continuously from chicken or marsupials through human, and 10 genes that arose specifically in the evolutionary lineage leading to rodent and primates. Conclusion This study demonstrates the central role of comparative genomics for refining gene catalogs and exploring the evolutionary history of gene repertoires, particularly as applied for the characterization of species-specific gene gains and losses.
Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes.

Science.gov (United States)

Kumar, Dhirendra; Mondal, Anupam Kumar; Kutum, Rintu; Dash, Debasis

2016-01-01

Sustainable innovations in sequencing technologies have resulted in a torrent of microbial genome sequencing projects. However, the prokaryotic genomes sequenced so far are unequally distributed along their phylogenetic tree; few phyla contain the majority, the rest only a few representatives. Accurate genome annotation lags far behind genome sequencing. While automated computational prediction, aided by comparative genomics, remains a popular choice for genome annotation, substantial fraction of these annotations are erroneous. Proteogenomics utilizes protein level experimental observations to annotate protein coding genes on a genome wide scale. Benefits of proteogenomics include discovery and correction of gene annotations regardless of their phylogenetic conservation. This not only allows detection of common, conserved proteins but also the discovery of protein products of rare genes that may be horizontally transferred or taxonomy specific. Chances of encountering such genes are more in rare phyla that comprise a small number of complete genome sequences. We collated all bacterial and archaeal proteogenomic studies carried out to date and reviewed them in the context of genome sequencing projects. Here, we present a comprehensive list of microbial proteogenomic studies, their taxonomic distribution, and also urge for targeted proteogenomics of underexplored taxa to build an extensive reference of protein coding genes. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Non-coding RNAs and epigenome: de novo DNA methylation, allelic exclusion and X-inactivation

Directory of Open Access Journals (Sweden)

V. A. Halytskiy

2013-12-01

Full Text Available Non-coding RNAs are widespread class of cell RNAs. They participate in many important processes in cells – signaling, posttranscriptional silencing, protein biosynthesis, splicing, maintenance of genome stability, telomere lengthening, X-inactivation. Nevertheless, activity of these RNAs is not restricted to posttranscriptional sphere, but cover also processes that change or maintain the epigenetic information. Non-coding RNAs can directly bind to the DNA targets and cause their repression through recruitment of DNA methyltransferases as well as chromatin modifying enzymes. Such events constitute molecular mechanism of the RNA-dependent DNA methylation. It is possible, that the RNA-DNA interaction is universal mechanism triggering DNA methylation de novo. Allelic exclusion can be also based on described mechanism. This phenomenon takes place, when non-coding RNA, which precursor is transcribed from one allele, triggers DNA methylation in all other alleles present in the cell. Note, that miRNA-mediated transcriptional silencing resembles allelic exclusion, because both miRNA gene and genes, which can be targeted by this miRNA, contain elements with the same sequences. It can be assumed that RNA-dependent DNA methylation and allelic exclusion originated with the purpose of counteracting the activity of mobile genetic elements. Probably, thinning and deregulation of the cellular non-coding RNA pattern allows reactivation of silent mobile genetic elements resulting in genome instability that leads to ageing and carcinogenesis. In the course of X-inactivation, DNA methylation and subsequent heterochromatinization of X chromosome can be triggered by direct hybridization of 5′-end of large non-coding RNA Xist with DNA targets in remote regions of the X chromosome.
Emerging putative associations between non-coding RNAs and protein-coding genes in Neuropathic Pain. Added value from re-using microarray data.

Directory of Open Access Journals (Sweden)

Enrico Capobianco

2016-10-01

Full Text Available Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs. This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve injury, and studied in a rat model, using two neuronal tissues, namely dorsal root ganglion (DRG and sciatic nerve (SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes, and re-purposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parent genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to neuropathic pain. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN, and 8 in DRG, antisense RNA (31 asRNA in SN, and 12 in DRG and pseudogenes (456 in SN, 56 in DRG. In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly
Adaptive Evolution Coupled with Retrotransposon Exaptation Allowed for the Generation of a Human-Protein-Specific Coding Gene That Promotes Cancer Cell Proliferation and Metastasis in Both Haematological Malignancies and Solid Tumours: The Extraordinary Case of MYEOV Gene

Directory of Open Access Journals (Sweden)

Spyros I. Papamichos

2015-01-01

Full Text Available The incidence of cancer in human is high as compared to chimpanzee. However previous analysis has documented that numerous human cancer-related genes are highly conserved in chimpanzee. Till date whether human genome includes species-specific cancer-related genes that could potentially contribute to a higher cancer susceptibility remains obscure. This study focuses on MYEOV, an oncogene encoding for two protein isoforms, reported as causally involved in promoting cancer cell proliferation and metastasis in both haematological malignancies and solid tumours. First we document, via stringent in silico analysis, that MYEOV arose de novo in Catarrhini. We show that MYEOV short-isoform start codon was evolutionarily acquired after Catarrhini/Platyrrhini divergence. Throughout the course of Catarrhini evolution MYEOV acquired a gradually elongated translatable open reading frame (ORF, a gradually shortened translation-regulatory upstream ORF, and alternatively spliced mRNA variants. A point mutation introduced in human allowed for the acquisition of MYEOV long-isoform start codon. Second, we demonstrate the precious impact of exonized transposable elements on the creation of MYEOV gene structure. Third, we highlight that the initial part of MYEOV long-isoform coding DNA sequence was under positive selection pressure during Catarrhini evolution. MYEOV represents a Primate Orphan Gene that acquired, via ORF expansion, a human-protein-specific coding potential.
A study on climatic adaptation of dipteran mitochondrial protein coding genes

Directory of Open Access Journals (Sweden)

Debajyoti Kabiraj

2017-10-01

Full Text Available Diptera, the true flies are frequently found in nature and their habitat is found all over the world including Antarctica and Polar Regions. The number of documented species for order diptera is quite high and thought to be 14% of the total animal present in the earth [1]. Most of the study in diptera has focused on the taxa of economic and medical importance, such as the fruit flies Ceratitis capitata and Bactrocera spp. (Tephritidae, which are serious agricultural pests; the blowflies (Calliphoridae and oestrid flies (Oestridae, which can cause myiasis; the anopheles mosquitoes (Culicidae, are the vectors of malaria; and leaf-miners (Agromyzidae, vegetable and horticultural pests [2]. Insect mitochondrion consists of 13 protein coding genes, 22 tRNAs and 2 rRNAs, are the remnant portion of alpha-proteobacteria is responsible for simultaneous function of energy production and thermoregulation of the cell through the bi-genomic system thus different adaptability in different climatic condition might have compensated by complementary changes is the both genomes [3,4]. In this study we have collected complete mitochondrial genome and occurrence data of one hundred thirteen such dipteran insects from different databases and literature survey. Our understanding of the genetic basis of climatic adaptation in diptera is limited to the basic information on the occurrence location of those species and mito genetic factors underlying changes in conspicuous phenotypes. To examine this hypothesis, we have taken an approach of Nucleotide substitution analysis for 13 protein coding genes of mitochondrial DNA individually and combined by different software for monophyletic group as well as paraphyletic group of dipteran species. Moreover, we have also calculated codon adaptation index for all dipteran mitochondrial protein coding genes. Following this work, we have classified our sample organisms according to their location data from GBIF (https
Dataset of the first transcriptome assembly of the tree crop “yerba mate” (Ilex paraguariensis and systematic characterization of protein coding genes

Directory of Open Access Journals (Sweden)

Patricia M. Aguilera

2018-04-01

Full Text Available This contribution contains data associated to the research article entitled “Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil. by NGS and de novo transcriptome assembly” (Debat et al., 2014 [1]. By means of a bioinformatic approach involving extensive NGS data analyses, we provide a resource encompassing the full transcriptome assembly of yerba mate, the first available reference for the Ilex L. genus. This dataset (Supplementary files 1 and 2 consolidates the transcriptome-wide assembled sequences of I. paraguariensis with further comprehensive annotation of the protein coding genes of yerba mate via the integration of Arabidopsis thaliana databases. The generated data is pivotal for the characterization of agronomical relevant genes in the tree crop yerba mate -a non-model species- and related taxa in Ilex. The raw sequencing data dissected here is available at DDBJ/ENA/GenBank (NCBI Resource Coordinators, 2016 [2] Sequence Read Archive (SRA under the accession SRP043293 and the assembled sequences have been deposited at the Transcriptome Shotgun Assembly Sequence Database (TSA under the accession GFHV00000000.
Bioinformatics analysis identify novel OB fold protein coding genes in C. elegans.

Directory of Open Access Journals (Sweden)

Daryanaz Dargahi

Full Text Available BACKGROUND: The C. elegans genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes in silico in this model organism is becoming more challenging requiring new approaches. The Oligonucleotide-oligosaccharide binding (OB fold is a highly divergent protein family, in which protein sequences, in spite of having the same fold, share very little sequence identity (5-25%. Therefore, evidence from sequence-based annotation may not be sufficient to identify all the members of this family. In C. elegans, the number of OB-fold proteins reported is remarkably low (n=46 compared to other evolutionary-related eukaryotes, such as yeast S. cerevisiae (n=344 or fruit fly D. melanogaster (n=84. Gene loss during evolution or differences in the level of annotation for this protein family, may explain these discrepancies. METHODOLOGY/PRINCIPAL FINDINGS: This study examines the possibility that novel OB-fold coding genes exist in the worm. We developed a bioinformatics approach that uses the most sensitive sequence-sequence, sequence-profile and profile-profile similarity search methods followed by 3D-structure prediction as a filtering step to eliminate false positive candidate sequences. We have predicted 18 coding genes containing the OB-fold that have remarkably partially been characterized in C. elegans. CONCLUSIONS/SIGNIFICANCE: This study raises the possibility that the annotation of highly divergent protein fold families can be improved in C. elegans. Similar strategies could be implemented for large scale analysis by the WormBase consortium when novel versions of the genome sequence of C. elegans, or other evolutionary related species are being released. This approach is of general interest to the scientific community since it can be used to annotate any genome.
Natural selection in avian protein-coding genes expressed in brain.

Science.gov (United States)

Axelsson, Erik; Hultin-Rosenberg, Lina; Brandström, Mikael; Zwahlén, Martin; Clayton, David F; Ellegren, Hans

2008-06-01

The evolution of birds from theropod dinosaurs took place approximately 150 million years ago, and was associated with a number of specific adaptations that are still evident among extant birds, including feathers, song and extravagant secondary sexual characteristics. Knowledge about the molecular evolutionary background to such adaptations is lacking. Here, we analyse the evolution of > 5000 protein-coding gene sequences expressed in zebra finch brain by comparison to orthologous sequences in chicken. Mean d(N)/d(S) is 0.085 and genes with their maximal expression in the eye and central nervous system have the lowest mean d(N)/d(S) value, while those expressed in digestive and reproductive tissues exhibit the highest. We find that fast-evolving genes (those which have higher than expected rate of nonsynonymous substitution, indicative of adaptive evolution) are enriched for biological functions such as fertilization, muscle contraction, defence response, response to stress, wounding and endogenous stimulus, and cell death. After alignment to mammalian orthologues, we identify a catalogue of 228 genes that show a significantly higher rate of protein evolution in the two bird lineages than in mammals. These accelerated bird genes, representing candidates for avian-specific adaptations, include genes implicated in vocal learning and other cognitive processes. Moreover, colouration genes evolve faster in birds than in mammals, which may have been driven by sexual selection for extravagant plumage characteristics.
Improved protein quality in transgenic soybean expressing a de novo synthetic protein, MB-16.

Science.gov (United States)

Zhang, Yunfang; Schernthaner, Johann; Labbé, Natalie; Hefford, Mary A; Zhao, Jiping; Simmonds, Daina H

2014-06-01

To improve soybean [Glycine max (L.) Merrill] seed nutritional quality, a synthetic gene, MB-16 was introduced into the soybean genome to boost seed methionine content. MB-16, an 11 kDa de novo protein enriched in the essential amino acids (EAAs) methionine, threonine, lysine and leucine, was originally developed for expression in rumen bacteria. For efficient seed expression, constructs were designed using the soybean codon bias, with and without the KDEL ER retention sequence, and β-conglycinin or cruciferin seed specific protein storage promoters. Homozygous lines, with single locus integrations, were identified for several transgenic events. Transgene transmission and MB-16 protein expression were confirmed to the T5 and T7 generations, respectively. Quantitative RT-PCR analysis of developing seed showed that the transcript peaked in growing seed, 5-6 mm long, remained at this peak level to the full-sized green seed and then was significantly reduced in maturing yellow seed. Transformed events carrying constructs with the rumen bacteria codon preference showed the same transcription pattern as those with the soybean codon preference, but the transcript levels were lower at each developmental stage. MB-16 protein levels, as determined by immunoblots, were highest in full-sized green seed but the protein virtually disappeared in mature seed. However, amino acid analysis of mature seed, in the best transgenic line, showed a significant increase of 16.2 and 65.9 % in methionine and cysteine, respectively, as compared to the parent. This indicates that MB-16 elevated the sulfur amino acids, improved the EAA seed profile and confirms that a de novo synthetic gene can enhance the nutritional quality of soybean.
Evaluation of the efficacy of twelve mitochondrial protein-coding genes as barcodes for mollusk DNA barcoding.

Science.gov (United States)

Yu, Hong; Kong, Lingfeng; Li, Qi

2016-01-01

In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.
DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.

Science.gov (United States)

Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald

2014-02-07

De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .
De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

DEFF Research Database (Denmark)

Ruzzo, Walter L; Gorodkin, Jan

2014-01-01

De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphas...... on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.......De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis...
De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses.

Science.gov (United States)

Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu

2015-01-01

Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.
Programming peptidomimetic syntheses by translating genetic codes designed de novo.

Science.gov (United States)

Forster, Anthony C; Tan, Zhongping; Nalam, Madhavi N L; Lin, Hening; Qu, Hui; Cornish, Virginia W; Blacklow, Stephen C

2003-05-27

Although the universal genetic code exhibits only minor variations in nature, Francis Crick proposed in 1955 that "the adaptor hypothesis allows one to construct, in theory, codes of bewildering variety." The existing code has been expanded to enable incorporation of a variety of unnatural amino acids at one or two nonadjacent sites within a protein by using nonsense or frameshift suppressor aminoacyl-tRNAs (aa-tRNAs) as adaptors. However, the suppressor strategy is inherently limited by compatibility with only a small subset of codons, by the ways such codons can be combined, and by variation in the efficiency of incorporation. Here, by preventing competing reactions with aa-tRNA synthetases, aa-tRNAs, and release factors during translation and by using nonsuppressor aa-tRNA substrates, we realize a potentially generalizable approach for template-encoded polymer synthesis that unmasks the substantially broader versatility of the core translation apparatus as a catalyst. We show that several adjacent, arbitrarily chosen sense codons can be completely reassigned to various unnatural amino acids according to de novo genetic codes by translating mRNAs into specific peptide analog polymers (peptidomimetics). Unnatural aa-tRNA substrates do not uniformly function as well as natural substrates, revealing important recognition elements for the translation apparatus. Genetic programming of peptidomimetic synthesis should facilitate mechanistic studies of translation and may ultimately enable the directed evolution of small molecules with desirable catalytic or pharmacological properties.
Transcriptome Sequencing, De Novo Assembly and Differential Gene Expression Analysis of the Early Development of Acipenser baeri.

Directory of Open Access Journals (Sweden)

Wei Song

Full Text Available The molecular mechanisms that drive the development of the endangered fossil fish species Acipenser baeri are difficult to study due to the lack of genomic data. Recent advances in sequencing technologies and the reducing cost of sequencing offer exclusive opportunities for exploring important molecular mechanisms underlying specific biological processes. This manuscript describes the large scale sequencing and analyses of mRNA from Acipenser baeri collected at five development time points using the Illumina Hiseq2000 platform. The sequencing reads were de novo assembled and clustered into 278167 unigenes, of which 57346 (20.62% had 45837 known homologues proteins in Uniprot protein databases while 11509 proteins matched with at least one sequence of assembled unigenes. The remaining 79.38% of unigenes could stand for non-coding unigenes or unigenes specific to A. baeri. A number of 43062 unigenes were annotated into functional categories via Gene Ontology (GO annotation whereas 29526 unigenes were associated with 329 pathways by mapping to KEGG database. Subsequently, 3479 differentially expressed genes were scanned within developmental stages and clustered into 50 gene expression profiles. Genes preferentially expressed at each stage were also identified. Through GO and KEGG pathway enrichment analysis, relevant physiological variations during the early development of A. baeri could be better cognized. Accordingly, the present study gives insights into the transcriptome profile of the early development of A. baeri, and the information contained in this large scale transcriptome will provide substantial references for A. baeri developmental biology and promote its aquaculture research.
De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses.

Directory of Open Access Journals (Sweden)

Feng-Yun Xie

Full Text Available Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR protein database. We also compared the donkey protein sequences with those of the horse (E. caballus and wild horse (E. przewalskii, and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.
The spatial distribution of fixed mutations within genes coding for proteins

Science.gov (United States)

Holmquist, R.; Goodman, M.; Conroy, T.; Czelusniak, J.

1983-01-01

An examination has been conducted of the extensive amino acid sequence data now available for five protein families - the alpha crystallin A chain, myoglobin, alpha and beta hemoglobin, and the cytochromes c - with the goal of estimating the true spatial distribution of base substitutions within genes that code for proteins. In every case the commonly used Poisson density failed to even approximate the experimental pattern of base substitution. For the 87 species of beta hemoglobin examined, for example, the probability that the observed results were from a Poisson process was the minuscule 10 to the -44th. Analogous results were obtained for the other functional families. All the data were reasonably, but not perfectly, described by the negative binomial density. In particular, most of the data were described by one of the very simple limiting forms of this density, the geometric density. The implications of this for evolutionary inference are discussed. It is evident that most estimates of total base substitutions between genes are badly in need of revision.

New Genes and Functional Innovation in Mammals.

Science.gov (United States)

Luis Villanueva-Cañas, José; Ruiz-Orera, Jorge; Agea, M Isabel; Gallo, Maria; Andreu, David; Albà, M Mar

2017-07-01

The birth of genes that encode new protein sequences is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. To address these questions, we have obtained a large collection of mammalian-specific gene families that lack homologues in other eukaryotic groups. We have combined gene annotations and de novo transcript assemblies from 30 different mammalian species, obtaining ∼6,000 gene families. In general, the proteins in mammalian-specific gene families tend to be short and depleted in aromatic and negatively charged residues. Proteins which arose early in mammalian evolution include milk and skin polypeptides, immune response components, and proteins involved in reproduction. In contrast, the functions of proteins which have a more recent origin remain largely unknown, despite the fact that these proteins also have extensive proteomics support. We identify several previously described cases of genes originated de novo from noncoding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes in mammals. Finally, we show that most young mammalian genes are preferentially expressed in testis, suggesting that sexual selection plays an important role in the emergence of new functional genes. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data.

Science.gov (United States)

Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

2017-01-04

The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Structural and functional studies of a family of Dictyostelium discoideum developmentally regulated, prestalk genes coding for small proteins

Directory of Open Access Journals (Sweden)

Escalante Ricardo

2008-01-01

Full Text Available Abstract Background The social amoeba Dictyostelium discoideum executes a multicellular development program upon starvation. This morphogenetic process requires the differential regulation of a large number of genes and is coordinated by extracellular signals. The MADS-box transcription factor SrfA is required for several stages of development, including slug migration and spore terminal differentiation. Results Subtractive hybridization allowed the isolation of a gene, sigN (SrfA-induced gene N, that was dependent on the transcription factor SrfA for expression at the slug stage of development. Homology searches detected the existence of a large family of sigN-related genes in the Dictyostelium discoideum genome. The 13 most similar genes are grouped in two regions of chromosome 2 and have been named Group1 and Group2 sigN genes. The putative encoded proteins are 87–89 amino acids long. All these genes have a similar structure, composed of a first exon containing a 13 nucleotides long open reading frame and a second exon comprising the remaining of the putative coding region. The expression of these genes is induced at10 hours of development. Analyses of their promoter regions indicate that these genes are expressed in the prestalk region of developing structures. The addition of antibodies raised against SigN Group 2 proteins induced disintegration of multi-cellular structures at the mound stage of development. Conclusion A large family of genes coding for small proteins has been identified in D. discoideum. Two groups of very similar genes from this family have been shown to be specifically expressed in prestalk cells during development. Functional studies using antibodies raised against Group 2 SigN proteins indicate that these genes could play a role during multicellular development.
De Novo Coding Variants Are Strongly Associated with Tourette Disorder

DEFF Research Database (Denmark)

Willsey, A Jeremy; Fernandez, Thomas V; Yu, Dongmei

2017-01-01

Whole-exome sequencing (WES) and de novo variant detection have proven a powerful approach to gene discovery in complex neurodevelopmental disorders. We have completed WES of 325 Tourette disorder trios from the Tourette International Collaborative Genetics cohort and a replication sample of 186 ...
De novo dominant mutation of SOX10 gene in a Chinese family with Waardenburg syndrome type II.

Science.gov (United States)

Chen, Kaitian; Zong, Ling; Liu, Min; Zhan, Yuan; Wu, Xuan; Zou, Wenting; Jiang, Hongyan

2014-06-01

Waardenburg syndrome is a rare genetic disorder, inherited as an autosomal dominant trait. The condition is characterized by sensorineural hearing loss and pigment disturbances of the hair, skin, and iris. The de novo mutation in the SOX10 gene, responsible for Waardenburg syndrome type II, is rarely seen. The present study aimed to identify the genetic causes of Waardenburg syndrome type II in a Chinese family. Clinical and molecular evaluations were conducted in a Chinese family with Waardenburg syndrome type II. A novel SOX10 heterozygous c.259-260delCT mutation was identified. Heterozygosity was not observed in the parents and sister of the proband, indicating that the mutation has arisen de novo. The novel frameshift mutation, located in exon 3 of the SOX10 gene, disrupted normal amino acid coding from Leu87, leading to premature termination at nucleotide 396 (TGA). The high mobility group domain of SOX10 was inferred to be partially impaired. The novel heterozygous c.259-260delCT mutation in the SOX10 gene was considered to be the cause of Waardenburg syndrome in the proband. The clinical and genetic characterization of this family would help elucidate the genetic heterogeneity of SOX10 in Waardenburg syndrome type II. Moreover, the de novo pattern expanded the mutation data of SOX10. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes.

Science.gov (United States)

Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

2017-10-03

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Automated de novo phasing and model building of coiled-coil proteins.

Science.gov (United States)

Rämisch, Sebastian; Lizatović, Robert; André, Ingemar

2015-03-01

Models generated by de novo structure prediction can be very useful starting points for molecular replacement for systems where suitable structural homologues cannot be readily identified. Protein-protein complexes and de novo-designed proteins are examples of systems that can be challenging to phase. In this study, the potential of de novo models of protein complexes for use as starting points for molecular replacement is investigated. The approach is demonstrated using homomeric coiled-coil proteins, which are excellent model systems for oligomeric systems. Despite the stereotypical fold of coiled coils, initial phase estimation can be difficult and many structures have to be solved with experimental phasing. A method was developed for automatic structure determination of homomeric coiled coils from X-ray diffraction data. In a benchmark set of 24 coiled coils, ranging from dimers to pentamers with resolutions down to 2.5 Å, 22 systems were automatically solved, 11 of which had previously been solved by experimental phasing. The generated models contained 71-103% of the residues present in the deposited structures, had the correct sequence and had free R values that deviated on average by 0.01 from those of the respective reference structures. The electron-density maps were of sufficient quality that only minor manual editing was necessary to produce final structures. The method, named CCsolve, combines methods for de novo structure prediction, initial phase estimation and automated model building into one pipeline. CCsolve is robust against errors in the initial models and can readily be modified to make use of alternative crystallographic software. The results demonstrate the feasibility of de novo phasing of protein-protein complexes, an approach that could also be employed for other small systems beyond coiled coils.
Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

Science.gov (United States)

Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

2012-01-01

Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to
Dependency on de novo protein synthesis and proteomic changes during metamorphosis of the marine bryozoan Bugula neritina

KAUST Repository

Wong, Yue Him

2010-05-24

Background: Metamorphosis in the bryozoan Bugula neritina (Linne) includes an initial phase of rapid morphological rearrangement followed by a gradual phase of morphogenesis. We hypothesized that the first phase may be independent of de novo synthesis of proteins and, instead, involves post-translational modifications of existing proteins, providing a simple mechanism to quickly initiate metamorphosis. To test our hypothesis, we challenged B. neritina larvae with transcription and translation inhibitors. Furthermore, we employed 2D gel electrophoresis to characterize changes in the phosphoproteome and proteome during early metamorphosis. Differentially expressed proteins were identified by liquid chromatography tandem mass spectrometry and their gene expression patterns were profiled using semi-quantitative real time PCR.Results: When larvae were incubated with transcription and translation inhibitors, metamorphosis initiated through the first phase but did not complete. We found a significant down-regulation of 60 protein spots and the percentage of phosphoprotein spots decreased from 15% in the larval stage to12% during early metamorphosis. Two proteins--the mitochondrial processing peptidase beta subunit and severin--were abundantly expressed and phosphorylated in the larval stage, but down-regulated during metamorphosis. MPPbeta and severin were also down-regulated on the gene expression level.Conclusions: The initial morphogenetic changes that led to attachment of B. neritina did not depend on de novo protein synthesis, but the subsequent gradual morphogenesis did. This is the first time that the mitochondrial processing peptidase beta subunit or severin have been shown to be down-regulated on both gene and protein expression levels during the metamorphosis of B. neritina. Future studies employing immunohistochemistry to reveal the expression locality of these two proteins during metamorphosis should provide further evidence of the involvement of these two
Ribosome Profiling Reveals Pervasive Translation Outside of Annotated Protein-Coding Genes

Directory of Open Access Journals (Sweden)

Nicholas T. Ingolia

2014-09-01

Full Text Available Ribosome profiling suggests that ribosomes occupy many regions of the transcriptome thought to be noncoding, including 5′ UTRs and long noncoding RNAs (lncRNAs. Apparent ribosome footprints outside of protein-coding regions raise the possibility of artifacts unrelated to translation, particularly when they occupy multiple, overlapping open reading frames (ORFs. Here, we show hallmarks of translation in these footprints: copurification with the large ribosomal subunit, response to drugs targeting elongation, trinucleotide periodicity, and initiation at early AUGs. We develop a metric for distinguishing between 80S footprints and nonribosomal sources using footprint size distributions, which validates the vast majority of footprints outside of coding regions. We present evidence for polypeptide production beyond annotated genes, including the induction of immune responses following human cytomegalovirus (HCMV infection. Translation is pervasive on cytosolic transcripts outside of conserved reading frames, and direct detection of this expanded universe of translated products enables efforts at understanding how cells manage and exploit its consequences.
De Novo Construction of Redox Active Proteins.

Science.gov (United States)

Moser, C C; Sheehan, M M; Ennist, N M; Kodali, G; Bialas, C; Englander, M T; Discher, B M; Dutton, P L

2016-01-01

Relatively simple principles can be used to plan and construct de novo proteins that bind redox cofactors and participate in a range of electron-transfer reactions analogous to those seen in natural oxidoreductase proteins. These designed redox proteins are called maquettes. Hydrophobic/hydrophilic binary patterning of heptad repeats of amino acids linked together in a single-chain self-assemble into 4-alpha-helix bundles. These bundles form a robust and adaptable frame for uncovering the default properties of protein embedded cofactors independent of the complexities introduced by generations of natural selection and allow us to better understand what factors can be exploited by man or nature to manipulate the physical chemical properties of these cofactors. Anchoring of redox cofactors such as hemes, light active tetrapyrroles, FeS clusters, and flavins by His and Cys residues allow cofactors to be placed at positions in which electron-tunneling rates between cofactors within or between proteins can be predicted in advance. The modularity of heptad repeat designs facilitates the construction of electron-transfer chains and novel combinations of redox cofactors and new redox cofactor assisted functions. Developing de novo designs that can support cofactor incorporation upon expression in a cell is needed to support a synthetic biology advance that integrates with natural bioenergetic pathways. © 2016 Elsevier Inc. All rights reserved.
De novo amplification within a silent human cholinesterase gene in a family subjected to prolonged exposure to organophosphorus insecticides

International Nuclear Information System (INIS)

Prody, C.A.; Dreyfus, P.; Soreq, H.; Zamir, R.; Zakut, H.

1989-01-01

A 100-fold DNA amplification in the CHE gene, coding for serum butyrylcholinesterase (BtChoEase), was found in a farmer expressing silent CHE phenotype. Individuals homozygous for this gene display a defective serum BtChoEase and are particularly vulnerable to poisoning by agricultural organophosphorus insecticides, to which all members of this family had long been exposed. DNA blot hybridization with regional BtChoEase cDNA probes suggested that the amplification was most intense in regions encoding central sequences within BtChoEase cDNA, whereas distal sequences were amplified to a much lower extent. This is in agreement with the onion skin model, based on amplification of genes in cultured cells and primary tumors. The amplification was absent in the grandparents but present at the same extent in one of their sons and in a grandson, with similar DNA blot hybridization patterns. In situ hybridization experiments localized the amplified sequences to the long arm of chromosome 3, close to the site where the authors previously mapped the CHE gene. Altogether, these observations suggest that the initial amplification event occurred early in embryogenesis, spermatogenesis, or oogenesis, where the CHE gene is intensely active and where cholinergic functioning was indicated to be physiologically necessary. These findings demonstrate a de novo amplification in apparently healthy individuals within an autosomal gene producing a target protein to an inhibitor
Nonsynonymous substitution rate (Ka is a relatively consistent parameter for defining fast-evolving and slow-evolving protein-coding genes

Directory of Open Access Journals (Sweden)

Wang Lei

2011-02-01

Full Text Available Abstract Background Mammalian genome sequence data are being acquired in large quantities and at enormous speeds. We now have a tremendous opportunity to better understand which genes are the most variable or conserved, and what their particular functions and evolutionary dynamics are, through comparative genomics. Results We chose human and eleven other high-coverage mammalian genome data–as well as an avian genome as an outgroup–to analyze orthologous protein-coding genes using nonsynonymous (Ka and synonymous (Ks substitution rates. After evaluating eight commonly-used methods of Ka and Ks calculation, we observed that these methods yielded a nearly uniform result when estimating Ka, but not Ks (or Ka/Ks. When sorting genes based on Ka, we noticed that fast-evolving and slow-evolving genes often belonged to different functional classes, with respect to species-specificity and lineage-specificity. In particular, we identified two functional classes of genes in the acquired immune system. Fast-evolving genes coded for signal-transducing proteins, such as receptors, ligands, cytokines, and CDs (cluster of differentiation, mostly surface proteins, whereas the slow-evolving genes were for function-modulating proteins, such as kinases and adaptor proteins. In addition, among slow-evolving genes that had functions related to the central nervous system, neurodegenerative disease-related pathways were enriched significantly in most mammalian species. We also confirmed that gene expression was negatively correlated with evolution rate, i.e. slow-evolving genes were expressed at higher levels than fast-evolving genes. Our results indicated that the functional specializations of the three major mammalian clades were: sensory perception and oncogenesis in primates, reproduction and hormone regulation in large mammals, and immunity and angiotensin in rodents. Conclusion Our study suggests that Ka calculation, which is less biased compared to Ks and Ka
Both noncoding and protein-coding RNAs contribute to gene expression evolution in the primate brain.

Science.gov (United States)

Babbitt, Courtney C; Fedrigo, Olivier; Pfefferle, Adam D; Boyle, Alan P; Horvath, Julie E; Furey, Terrence S; Wray, Gregory A

2010-01-18

Despite striking differences in cognition and behavior between humans and our closest primate relatives, several studies have found little evidence for adaptive change in protein-coding regions of genes expressed primarily in the brain. Instead, changes in gene expression may underlie many cognitive and behavioral differences. Here, we used digital gene expression: tag profiling (here called Tag-Seq, also called DGE:tag profiling) to assess changes in global transcript abundance in the frontal cortex of the brains of 3 humans, 3 chimpanzees, and 3 rhesus macaques. A substantial fraction of transcripts we identified as differentially transcribed among species were not assayed in previous studies based on microarrays. Differentially expressed tags within coding regions are enriched for gene functions involved in synaptic transmission, transport, oxidative phosphorylation, and lipid metabolism. Importantly, because Tag-Seq technology provides strand-specific information about all polyadenlyated transcripts, we were able to assay expression in noncoding intragenic regions, including both sense and antisense noncoding transcripts (relative to nearby genes). We find that many noncoding transcripts are conserved in both location and expression level between species, suggesting a possible functional role. Lastly, we examined the overlap between differential gene expression and signatures of positive selection within putative promoter regions, a sign that these differences represent adaptations during human evolution. Comparative approaches may provide important insights into genes responsible for differences in cognitive functions between humans and nonhuman primates, as well as highlighting new candidate genes for studies investigating neurological disorders.
Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

International Nuclear Information System (INIS)

Yu Jia-Feng; Sui Tian-Xiang; Wang Ji-Hua; Wang Hong-Mei; Wang Chun-Ling; Jing Li

2015-01-01

Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. (special topic)
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

Science.gov (United States)

Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

2015-12-01

A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.
Identification of a novel Plasmopara halstedii elicitor protein combining de novo peptide sequencing algorithms and RACE-PCR

Directory of Open Access Journals (Sweden)

Madlung Johannes

2010-05-01

Full Text Available Abstract Background Often high-quality MS/MS spectra of tryptic peptides do not match to any database entry because of only partially sequenced genomes and therefore, protein identification requires de novo peptide sequencing. To achieve protein identification of the economically important but still unsequenced plant pathogenic oomycete Plasmopara halstedii, we first evaluated the performance of three different de novo peptide sequencing algorithms applied to a protein digests of standard proteins using a quadrupole TOF (QStar Pulsar i. Results The performance order of the algorithms was PEAKS online > PepNovo > CompNovo. In summary, PEAKS online correctly predicted 45% of measured peptides for a protein test data set. All three de novo peptide sequencing algorithms were used to identify MS/MS spectra of tryptic peptides of an unknown 57 kDa protein of P. halstedii. We found ten de novo sequenced peptides that showed homology to a Phytophthora infestans protein, a closely related organism of P. halstedii. Employing a second complementary approach, verification of peptide prediction and protein identification was performed by creation of degenerate primers for RACE-PCR and led to an ORF of 1,589 bp for a hypothetical phosphoenolpyruvate carboxykinase. Conclusions Our study demonstrated that identification of proteins within minute amounts of sample material improved significantly by combining sensitive LC-MS methods with different de novo peptide sequencing algorithms. In addition, this is the first study that verified protein prediction from MS data by also employing a second complementary approach, in which RACE-PCR led to identification of a novel elicitor protein in P. halstedii.
A rice gene of de novo origin negatively regulates pathogen-induced defense response.

Directory of Open Access Journals (Sweden)

Wenfei Xiao

Full Text Available How defense genes originated with the evolution of their specific pathogen-responsive traits remains an important problem. It is generally known that a form of duplication can generate new genes, suggesting that a new gene usually evolves from an ancestral gene. However, we show that a new defense gene in plants may evolve by de novo origination, resulting in sophisticated disease-resistant functions in rice. Analyses of gene evolution showed that this new gene, OsDR10, had homologs only in the closest relative, Leersia genus, but not other subfamilies of the grass family; therefore, it is a rice tribe-specific gene that may have originated de novo in the tribe. We further show that this gene may evolve a highly conservative rice-specific function that contributes to the regulation difference between rice and other plant species in response to pathogen infections. Biologic analyses including gene silencing, pathologic analysis, and mutant characterization by transformation showed that the OsDR10-suppressed plants enhanced resistance to a broad spectrum of Xanthomonas oryzae pv. oryzae strains, which cause bacterial blight disease. This enhanced disease resistance was accompanied by increased accumulation of endogenous salicylic acid (SA and suppressed accumulation of endogenous jasmonic acid (JA as well as modified expression of a subset of defense-responsive genes functioning both upstream and downstream of SA and JA. These data and analyses provide fresh insights into the new biologic and evolutionary processes of a de novo gene recruited rapidly.
De novo transcriptome assembly facilitates characterisation of fast-evolving gene families, MHC class I in the bank vole (Myodes glareolus).

Science.gov (United States)

Migalska, M; Sebastian, A; Konczal, M; Kotlík, P; Radwan, J

2017-04-01

The major histocompatibility complex (MHC) plays a central role in the adaptive immune response and is the most polymorphic gene family in vertebrates. Although high-throughput sequencing has increasingly been used for genotyping families of co-amplifying MHC genes, its potential to facilitate early steps in the characterisation of MHC variation in nonmodel organism has not been fully explored. In this study we evaluated the usefulness of de novo transcriptome assembly in characterisation of MHC sequence diversity. We found that although de novo transcriptome assembly of MHC I genes does not reconstruct sequences of individual alleles, it does allow the identification of conserved regions for PCR primer design. Using the newly designed primers, we characterised MHC I sequences in the bank vole. Phylogenetic analysis of the partial MHC I coding sequence (2-4 exons) of the bank vole revealed a lack of orthology to MHC I of other Cricetidae, consistent with the high gene turnover of this region. The diversity of expressed alleles was characterised using ultra-deep sequencing of the third exon that codes for the peptide-binding region of the MHC molecule. High allelic diversity was demonstrated, with 72 alleles found in 29 individuals. Interindividual variation in the number of expressed loci was found, with the number of alleles per individual ranging from 5 to 14. Strong signatures of positive selection were found for 8 amino acid sites, most of which are inferred to bind antigens in human MHC, indicating conservation of structure despite rapid sequence evolution.
Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

Science.gov (United States)

Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua

2015-12-01

Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.

Conserved syntenic clusters of protein coding genes are missing in birds.

Science.gov (United States)

Lovell, Peter V; Wirthlin, Morgan; Wilhelm, Larry; Minx, Patrick; Lazar, Nathan H; Carbone, Lucia; Warren, Wesley C; Mello, Claudio V

2014-01-01

Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.
BayesMotif: de novo protein sorting motif discovery from impure datasets.

Science.gov (United States)

Hu, Jianjun; Zhang, Fan

2010-01-18

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of
PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

Science.gov (United States)

Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

2015-01-01

A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591
De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

Science.gov (United States)

2013-01-01

Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514
Building a better fragment library for de novo protein structure prediction.

Directory of Open Access Journals (Sweden)

Saulo H P de Oliveira

Full Text Available Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10. We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. "Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources".
Building a Better Fragment Library for De Novo Protein Structure Prediction

Science.gov (United States)

de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

2015-01-01

Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595
Histone modification profiles are predictive for tissue/cell-type specific expression of both protein-coding and microRNA genes

Directory of Open Access Journals (Sweden)

Zhang Michael Q

2011-05-01

Full Text Available Abstract Background Gene expression is regulated at both the DNA sequence level and through modification of chromatin. However, the effect of chromatin on tissue/cell-type specific gene regulation (TCSR is largely unknown. In this paper, we present a method to elucidate the relationship between histone modification/variation (HMV and TCSR. Results A classifier for differentiating CD4+ T cell-specific genes from housekeeping genes using HMV data was built. We found HMV in both promoter and gene body regions to be predictive of genes which are targets of TCSR. For example, the histone modification types H3K4me3 and H3K27ac were identified as the most predictive for CpG-related promoters, whereas H3K4me3 and H3K79me3 were the most predictive for nonCpG-related promoters. However, genes targeted by TCSR can be predicted using other type of HMVs as well. Such redundancy implies that multiple type of underlying regulatory elements, such as enhancers or intragenic alternative promoters, which can regulate gene expression in a tissue/cell-type specific fashion, may be marked by the HMVs. Finally, we show that the predictive power of HMV for TCSR is not limited to protein-coding genes in CD4+ T cells, as we successfully predicted TCSR targeted genes in muscle cells, as well as microRNA genes with expression specific to CD4+ T cells, by the same classifier which was trained on HMV data of protein-coding genes in CD4+ T cells. Conclusion We have begun to understand the HMV patterns that guide gene expression in both tissue/cell-type specific and ubiquitous manner.
Expression of protein-coding genes embedded in ribosomal DNA

DEFF Research Database (Denmark)

Johansen, Steinar D; Haugen, Peik; Nielsen, Henrik

2007-01-01

Ribosomal DNA (rDNA) is a specialised chromosomal location that is dedicated to high-level transcription of ribosomal RNA genes. Interestingly, rDNAs are frequently interrupted by parasitic elements, some of which carry protein genes. These are non-LTR retrotransposons and group II introns that e...... in the nucleolus....
Partitioning of genetic variation between regulatory and coding gene segments: the predominance of software variation in genes encoding introvert proteins.

Science.gov (United States)

Mitchison, A

1997-01-01

In considering genetic variation in eukaryotes, a fundamental distinction can be made between variation in regulatory (software) and coding (hardware) gene segments. For quantitative traits the bulk of variation, particularly that near the population mean, appears to reside in regulatory segments. The main exceptions to this rule concern proteins which handle extrinsic substances, here termed extrovert proteins. The immune system includes an unusually large proportion of this exceptional category, but even so its chief source of variation may well be polymorphism in regulatory gene segments. The main evidence for this view emerges from genome scanning for quantitative trait loci (QTL), which in the case of the immune system points to a major contribution of pro-inflammatory cytokine genes. Further support comes from sequencing of major histocompatibility complex (Mhc) class II promoters, where a high level of polymorphism has been detected. These Mhc promoters appear to act, in part at least, by gating the back-signal from T cells into antigen-presenting cells. Both these forms of polymorphism are likely to be sustained by the need for flexibility in the immune response. Future work on promoter polymorphism is likely to benefit from the input from genome informatics.
Selfish DNA in protein-coding genes of Rickettsia.

Science.gov (United States)

Ogata, H; Audic, S; Barbe, V; Artiguenave, F; Fournier, P E; Raoult, D; Claverie, J M

2000-10-13

Rickettsia conorii, the aetiological agent of Mediterranean spotted fever, is an intracellular bacterium transmitted by ticks. Preliminary analyses of the nearly complete genome sequence of R. conorii have revealed 44 occurrences of a previously undescribed palindromic repeat (150 base pairs long) throughout the genome. Unexpectedly, this repeat was found inserted in-frame within 19 different R. conorii open reading frames likely to encode functional proteins. We found the same repeat in proteins of other Rickettsia species. The finding of a mobile element inserted in many unrelated genes suggests the potential role of selfish DNA in the creation of new protein sequences.
De novo mutations in synaptic transmission genes including DNM1 cause epileptic encephalopathies

DEFF Research Database (Denmark)

2014-01-01

in five individuals and de novo mutations in GABBR2, FASN, and RYR3 in two individuals each. Unlike previous studies, this cohort is sufficiently large to show a significant excess of de novo mutations in epileptic encephalopathy probands compared to the general population using a likelihood analysis (p...... = 8.2 × 10(-4)), supporting a prominent role for de novo mutations in epileptic encephalopathies. We bring statistical evidence that mutations in DNM1 cause epileptic encephalopathy, find suggestive evidence for a role of three additional genes, and show that at least 12% of analyzed individuals have...... analyzed exome-sequencing data of 356 trios with the "classical" epileptic encephalopathies, infantile spasms and Lennox Gastaut syndrome, including 264 trios previously analyzed by the Epi4K/EPGP consortium. In this expanded cohort, we find 429 de novo mutations, including de novo mutations in DNM1...
De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits.

Directory of Open Access Journals (Sweden)

Shuang Chen

Full Text Available Locusts exhibit remarkable density-dependent phenotype (phase changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to identify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4(th instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date, with optimization of sequencing and assembly strategy, can further facilitate the application of de novo transcriptome. The locust transcriptome enriches genetic resources for hemimetabolous insects and our understanding of the origin of insect metamorphosis. Most importantly, we identified genes and pathways that might be involved in locust development
Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms.

Science.gov (United States)

Cerveau, Nicolas; Jackson, Daniel J

2016-12-09

Next-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs. In this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information. Our approach yields coding transcriptome assemblies that are more likely to be
De novo mutation in the dopamine transporter gene associates dopamine dysfunction with autism spectrum disorder

DEFF Research Database (Denmark)

Hamilton, P J; Campbell, N G; Sharma, S

2013-01-01

De novo genetic variation is an important class of risk factors for autism spectrum disorder (ASD). Recently, whole-exome sequencing of ASD families has identified a novel de novo missense mutation in the human dopamine (DA) transporter (hDAT) gene, which results in a Thr to Met substitution...
The small RNA content of human sperm reveals pseudogene-derived piRNAs complementary to protein-coding genes

DEFF Research Database (Denmark)

Pantano, Lorena; Jodar, Meritxell; Bak, Mads

2015-01-01

-specific genes. The most abundant class of small noncoding RNAs in sperm are PIWI-interacting RNAs (piRNAs). Surprisingly, we found that human sperm cells contain piRNAs processed from pseudogenes. Clusters of piRNAs from human testes contain pseudogenes transcribed in the antisense strand and processed...... into small RNAs. Several human protein-coding genes contain antisense predicted targets of pseudogene-derived piRNAs in the male germline and these piRNAs are still found in mature sperm. Our study provides the most extensive data set and annotation of human sperm small RNAs to date and is a resource...... for further functional studies on the roles of sperm small RNAs. In addition, we propose that some of the pseudogene-derived human piRNAs may regulate expression of their parent gene in the male germline....
The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

Science.gov (United States)

Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

2012-11-01

Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.
De novo mutation in the dopamine transporter gene associates dopamine dysfunction with autism spectrum disorder.

Science.gov (United States)

Hamilton, P J; Campbell, N G; Sharma, S; Erreger, K; Herborg Hansen, F; Saunders, C; Belovich, A N; Sahai, M A; Cook, E H; Gether, U; McHaourab, H S; Matthies, H J G; Sutcliffe, J S; Galli, A

2013-12-01

De novo genetic variation is an important class of risk factors for autism spectrum disorder (ASD). Recently, whole-exome sequencing of ASD families has identified a novel de novo missense mutation in the human dopamine (DA) transporter (hDAT) gene, which results in a Thr to Met substitution at site 356 (hDAT T356M). The dopamine transporter (DAT) is a presynaptic membrane protein that regulates dopaminergic tone in the central nervous system by mediating the high-affinity reuptake of synaptically released DA, making it a crucial regulator of DA homeostasis. Here, we report the first functional, structural and behavioral characterization of an ASD-associated de novo mutation in the hDAT. We demonstrate that the hDAT T356M displays anomalous function, characterized as a persistent reverse transport of DA (substrate efflux). Importantly, in the bacterial homolog leucine transporter, substitution of A289 (the homologous site to T356) with a Met promotes an outward-facing conformation upon substrate binding. In the substrate-bound state, an outward-facing transporter conformation is required for substrate efflux. In Drosophila melanogaster, the expression of hDAT T356M in DA neurons-lacking Drosophila DAT leads to hyperlocomotion, a trait associated with DA dysfunction and ASD. Taken together, our findings demonstrate that alterations in DA homeostasis, mediated by aberrant DAT function, may confer risk for ASD and related neuropsychiatric conditions.
De novo transcriptome assembly of shrimp Palaemon serratus

Directory of Open Access Journals (Sweden)

Alejandra Perina

2017-03-01

Full Text Available The shrimp Palaemon serratus is a coastal decapod crustacean with a high commercial value. It is harvested for human consumption. In this study, we used Illumina sequencing technology (HiSeq 2000 to sequence, assemble and annotate the transcriptome of P. serratus. RNA was isolated from muscle of adults individuals and, from a pool of larvae. A total number of 4 cDNA libraries were constructed, using the TruSeq RNA Sample Preparation Kit v2. The raw data in this study was deposited in NCBI SRA database with study accession number of SRP090769. The obtained data were subjected to de novo transcriptome assembly using Trinity software, and coding regions were predicted by TransDecoder. We used Blastp and Sma3s to annotate the identified proteins. The transcriptome data could provide some insight into the understanding of genes involved in the larval development and metamorphosis.
Heterozygous de novo and inherited mutations in the smooth muscle actin (ACTG2 gene underlie megacystis-microcolon-intestinal hypoperistalsis syndrome.

Directory of Open Access Journals (Sweden)

Michael F Wangler

2014-03-01

Full Text Available Megacystis-microcolon-intestinal hypoperistalsis syndrome (MMIHS is a rare disorder of enteric smooth muscle function affecting the intestine and bladder. Patients with this severe phenotype are dependent on total parenteral nutrition and urinary catheterization. The cause of this syndrome has remained a mystery since Berdon's initial description in 1976. No genes have been clearly linked to MMIHS. We used whole-exome sequencing for gene discovery followed by targeted Sanger sequencing in a cohort of patients with MMIHS and intestinal pseudo-obstruction. We identified heterozygous ACTG2 missense variants in 15 unrelated subjects, ten being apparent de novo mutations. Ten unique variants were detected, of which six affected CpG dinucleotides and resulted in missense mutations at arginine residues, perhaps related to biased usage of CpG containing codons within actin genes. We also found some of the same heterozygous mutations that we observed as apparent de novo mutations in MMIHS segregating in families with intestinal pseudo-obstruction, suggesting that ACTG2 is responsible for a spectrum of smooth muscle disease. ACTG2 encodes γ2 enteric actin and is the first gene to be clearly associated with MMIHS, suggesting an important role for contractile proteins in enteric smooth muscle disease.
Kinetic models of gene expression including non-coding RNAs

Energy Technology Data Exchange (ETDEWEB)

Zhdanov, Vladimir P., E-mail: zhdanov@catalysis.r

2011-03-15

In cells, genes are transcribed into mRNAs, and the latter are translated into proteins. Due to the feedbacks between these processes, the kinetics of gene expression may be complex even in the simplest genetic networks. The corresponding models have already been reviewed in the literature. A new avenue in this field is related to the recognition that the conventional scenario of gene expression is fully applicable only to prokaryotes whose genomes consist of tightly packed protein-coding sequences. In eukaryotic cells, in contrast, such sequences are relatively rare, and the rest of the genome includes numerous transcript units representing non-coding RNAs (ncRNAs). During the past decade, it has become clear that such RNAs play a crucial role in gene expression and accordingly influence a multitude of cellular processes both in the normal state and during diseases. The numerous biological functions of ncRNAs are based primarily on their abilities to silence genes via pairing with a target mRNA and subsequently preventing its translation or facilitating degradation of the mRNA-ncRNA complex. Many other abilities of ncRNAs have been discovered as well. Our review is focused on the available kinetic models describing the mRNA, ncRNA and protein interplay. In particular, we systematically present the simplest models without kinetic feedbacks, models containing feedbacks and predicting bistability and oscillations in simple genetic networks, and models describing the effect of ncRNAs on complex genetic networks. Mathematically, the presentation is based primarily on temporal mean-field kinetic equations. The stochastic and spatio-temporal effects are also briefly discussed.

Human native lipoprotein-induced de novo DNA methylation is associated with repression of inflammatory genes in THP-1 macrophages.

Science.gov (United States)

Rangel-Salazar, Rubén; Wickström-Lindholm, Marie; Aguilar-Salinas, Carlos A; Alvarado-Caudillo, Yolanda; Døssing, Kristina B V; Esteller, Manel; Labourier, Emmanuel; Lund, Gertrud; Nielsen, Finn C; Rodríguez-Ríos, Dalia; Solís-Martínez, Martha O; Wrobel, Katarzyna; Wrobel, Kazimierz; Zaina, Silvio

2011-11-25

We previously showed that a VLDL- and LDL-rich mix of human native lipoproteins induces a set of repressive epigenetic marks, i.e. de novo DNA methylation, histone 4 hypoacetylation and histone 4 lysine 20 (H4K20) hypermethylation in THP-1 macrophages. Here, we: 1) ask what gene expression changes accompany these epigenetic responses; 2) test the involvement of candidate factors mediating the latter. We exploited genome expression arrays to identify target genes for lipoprotein-induced silencing, in addition to RNAi and expression studies to test the involvement of candidate mediating factors. The study was conducted in human THP-1 macrophages. Native lipoprotein-induced de novo DNA methylation was associated with a general repression of various critical genes for macrophage function, including pro-inflammatory genes. Lipoproteins showed differential effects on epigenetic marks, as de novo DNA methylation was induced by VLDL and to a lesser extent by LDL, but not by HDL, and VLDL induced H4K20 hypermethylation, while HDL caused H4 deacetylation. The analysis of candidate factors mediating VLDL-induced DNA hypermethylation revealed that this response was: 1) surprisingly, mediated exclusively by the canonical maintenance DNA methyltransferase DNMT1, and 2) independent of the Dicer/micro-RNA pathway. Our work provides novel insights into epigenetic gene regulation by native lipoproteins. Furthermore, we provide an example of DNMT1 acting as a de novo DNA methyltransferase independently of canonical de novo enzymes, and show proof of principle that de novo DNA methylation can occur independently of a functional Dicer/micro-RNA pathway in mammals.
Pathway Detection from Protein Interaction Networks and Gene Expression Data Using Color-Coding Methods and A* Search Algorithms

Directory of Open Access Journals (Sweden)

Cheng-Yu Yeh

2012-01-01

Full Text Available With the large availability of protein interaction networks and microarray data supported, to identify the linear paths that have biological significance in search of a potential pathway is a challenge issue. We proposed a color-coding method based on the characteristics of biological network topology and applied heuristic search to speed up color-coding method. In the experiments, we tested our methods by applying to two datasets: yeast and human prostate cancer networks and gene expression data set. The comparisons of our method with other existing methods on known yeast MAPK pathways in terms of precision and recall show that we can find maximum number of the proteins and perform comparably well. On the other hand, our method is more efficient than previous ones and detects the paths of length 10 within 40 seconds using CPU Intel 1.73GHz and 1GB main memory running under windows operating system.
Human native lipoprotein-induced de novo DNA methylation is associated with repression of inflammatory genes in THP-1 macrophages

Directory of Open Access Journals (Sweden)

Rangel-Salazar Rubén

2011-11-01

Full Text Available Abstract Background We previously showed that a VLDL- and LDL-rich mix of human native lipoproteins induces a set of repressive epigenetic marks, i.e. de novo DNA methylation, histone 4 hypoacetylation and histone 4 lysine 20 (H4K20 hypermethylation in THP-1 macrophages. Here, we: 1 ask what gene expression changes accompany these epigenetic responses; 2 test the involvement of candidate factors mediating the latter. We exploited genome expression arrays to identify target genes for lipoprotein-induced silencing, in addition to RNAi and expression studies to test the involvement of candidate mediating factors. The study was conducted in human THP-1 macrophages. Results Native lipoprotein-induced de novo DNA methylation was associated with a general repression of various critical genes for macrophage function, including pro-inflammatory genes. Lipoproteins showed differential effects on epigenetic marks, as de novo DNA methylation was induced by VLDL and to a lesser extent by LDL, but not by HDL, and VLDL induced H4K20 hypermethylation, while HDL caused H4 deacetylation. The analysis of candidate factors mediating VLDL-induced DNA hypermethylation revealed that this response was: 1 surprisingly, mediated exclusively by the canonical maintenance DNA methyltransferase DNMT1, and 2 independent of the Dicer/micro-RNA pathway. Conclusions Our work provides novel insights into epigenetic gene regulation by native lipoproteins. Furthermore, we provide an example of DNMT1 acting as a de novo DNA methyltransferase independently of canonical de novo enzymes, and show proof of principle that de novo DNA methylation can occur independently of a functional Dicer/micro-RNA pathway in mammals.
ELFN1-AS1: A Novel Primate Gene with Possible MicroRNA Function Expressed Predominantly in Human Tumors

Directory of Open Access Journals (Sweden)

Dmitrii E. Polev

2014-01-01

Full Text Available Human gene LOC100505644 uncharacterized LOC100505644 [Homo sapiens] (Entrez Gene ID 100505644 is abundantly expressed in tumors but weakly expressed in few normal tissues. Till now the function of this gene remains unknown. Here we identified the chromosomal borders of the transcribed region and the major splice form of the LOC100505644-specific transcript. We characterised the major regulatory motifs of the gene and its splice sites. Analysis of the secondary structure of the major transcript variant revealed a hairpin-like structure characteristic for precursor microRNAs. Comparative genomic analysis of the locus showed that it originated in primates de novo. Taken together, our data indicate that human gene LOC100505644 encodes some non-protein coding RNA, likely a microRNA. It was assigned a gene symbol ELFN1-AS1 (ELFN1 antisense RNA 1 (non-protein coding. This gene combines features of evolutionary novelty and predominant expression in tumors.
Transfection of Chinese hamster ovary DHFR/sup -/ cells with the gene coding for heat shock protein 70 from drosophila melanogaster

International Nuclear Information System (INIS)

Duffy, J.J.; Carper, S.W.; Gerner, E.W.

1987-01-01

Chinese hamster ovary DHFR/sup -/ cells (CHO-DHFR/sup -/) were transfected with the plasmid pSV2-dhfr expressing the mouse gene coding for dhfr or with the same plasmid containing the gene coding for the Drosophila melanogaster heat shock protein 70 (hsp70), pSVd-hsp70. Three subcloned cell lines selected for expression of the dhfr gene were shown to contain either the vector sequence (G cells) or varying copies of pSVd-hsp70 (H cells). One line of H cells was shown to contain > 30 copies of the D. melanogaster hsp70 gene and to express the hsp70 RNA at significant levels. No difference between G and H cells was observed in the rate of growth, in the development of thermotolerance, or in the sensitivity of actin microfilament bundles to heat shock. However, H cells containing the transfected hsp70 gene had an altered morphology when compared to the G cells and the parental CHO-DHFR/sup -/ cells being more fibroblastic. The adhesion properties of the H cells was also decreased when compared to the G cells. These results show that insertion of the D. melanogaster gene into CHO cells does not effect growth rates or heat shock responses but may alter cell morphology and adhesion
De novo transcriptome assembly of drought tolerant CAM plants, Agave deserti and Agave tequilana.

Science.gov (United States)

Gross, Stephen M; Martin, Jeffrey A; Simpson, June; Abraham-Juarez, María Jazmín; Wang, Zhong; Visel, Axel

2013-08-19

Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis. Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development.
Identification of a cis-regulatory region of a gene in Arabidopsis thaliana whose induction by dehydration is mediated by abscisic acid and requires protein synthesis.

Science.gov (United States)

Iwasaki, T; Yamaguchi-Shinozaki, K; Shinozaki, K

1995-05-20

In Arabidopsis thaliana, the induction of a dehydration-responsive gene, rd22, is mediated by abscisic acid (ABA) but the gene does not include any sequence corresponding to the consensus ABA-responsive element (ABRE), RYACGTGGYR, in its promoter region. The cis-regulatory region of the rd22 promoter was identified by monitoring the expression of beta-glucuronidase (GUS) activity in leaves of transgenic tobacco plants transformed with chimeric gene fusions constructed between 5'-deleted promoters of rd22 and the coding region of the GUS reporter gene. A 67-bp nucleotide fragment corresponding to positions -207 to -141 of the rd22 promoter conferred responsiveness to dehydration and ABA on a non-responsive promoter. The 67-bp fragment contains the sequences of the recognition sites for some transcription factors, such as MYC, MYB, and GT-1. The fact that accumulation of rd22 mRNA requires protein synthesis raises the possibility that the expression of rd22 might be regulated by one of these trans-acting protein factors whose de novo synthesis is induced by dehydration or ABA. Although the structure of the RD22 protein is very similar to that of a non-storage seed protein, USP, of Vicia faba, the expression of the GUS gene driven by the rd22 promoter in non-stressed transgenic Arabidopsis plants was found mainly in flowers and bolted stems rather than in seeds.
A novel bidirectional expression system for simultaneous expression of both the protein-coding genes and short hairpin RNAs in mammalian cells

International Nuclear Information System (INIS)

Hung, C.-F.; Cheng, T.-L.; Wu, R.-H.; Teng, C.-F.; Chang, W.-T.

2006-01-01

RNA interference (RNAi) is an extremely powerful and widely used gene silencing approach for reverse functional genomics and molecular therapeutics. In mammals, the conserved poly(ADP-ribose) polymerase 2 (PARP-2)/RNase P bidirectional control promoter simultaneously expresses both the PARP-2 protein and RNase P RNA by RNA polymerase II- and III-dependent mechanisms, respectively. To explore this unique bidirectional control system in RNAi-mediated gene silencing strategy, we have constructed two novel bidirectional expression vectors, pbiHsH1 and pbiMmH1, which contained the PARP-2/RNase P bidirectional control promoters from human and mouse, for simultaneous expression of both the protein-coding genes and short hairpin RNAs. Analyses of the dual transcriptional activities indicated that these two bidirectional expression vectors could not only express enhanced green fluorescent protein as a functional reporter but also simultaneously transcribe shLuc for inhibiting the firefly luciferase expression. In addition, to extend its utility for the establishment of inherited stable clones, we have also reconstructed this bidirectional expression system with the blasticidin S deaminase gene, an effective dominant drug resistance selectable marker, and examined both the selection and inhibition efficiencies in drug resistance and gene expression. Moreover, we have further demonstrated that this bidirectional expression system could efficiently co-regulate the functionally important genes, such as overexpression of tumor suppressor protein p53 and inhibition of anti-apoptotic protein Bcl-2 at the same time. In summary, the bidirectional expression vectors, pbiHsH1 and pbiMmH1, should provide a simple, convenient, and efficient novel tool for manipulating the gene function in mammalian cells
Functional characterization of a rice de novo DNA methyltransferase, OsDRM2, expressed in Escherichia coli and yeast

Energy Technology Data Exchange (ETDEWEB)

Pang, Jinsong, E-mail: pangjs542@nenu.edu.cn [Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, Jilin 130024 (China); Dong, Mingyue; Li, Ning; Zhao, Yanli [Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, Jilin 130024 (China); Liu, Bao, E-mail: baoliu@nenu.edu.cn [Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, Jilin 130024 (China)

2013-03-01

Highlights: ► A rice de novo DNA methyltransferase OsDRM2 was cloned. ► In vitro methylation activity of OsDRM2 was characterized with Escherichia coli. ► Assays of OsDRM2 in vivo methylation were done with Saccharomyces cerevisiae. ► OsDRM2 methylation activity is not preferential to any type of cytosine context. ► The activity of OsDRM2 is independent of RdDM pathway. - Abstract: DNA methylation of cytosine nucleotides is an important epigenetic modification that occurs in most eukaryotic organisms and is established and maintained by various DNA methyltransferases together with their co-factors. There are two major categories of DNA methyltransferases: de novo and maintenance. Here, we report the isolation and functional characterization of a de novo methyltransferase, named OsDRM2, from rice (Oryza sativa L.). The full-length coding region of OsDRM2 was cloned and transformed into Escherichia coli and Saccharomyces cerevisiae. Both of these organisms expressed the OsDRM2 protein, which exhibited stochastic de novo methylation activity in vitro at CG, CHG, and CHH di- and tri-nucleotide patterns. Two lines of evidence demonstrated the de novo activity of OsDRM2: (1) a 5′-CCGG-3′ containing DNA fragment that had been pre-treated with OsDRM2 protein expressed in E. coli was protected from digestion by the CG-methylation-sensitive isoschizomer HpaII; (2) methylation-sensitive amplified polymorphism (MSAP) analysis of S. cerevisiae genomic DNA from transformants that had been introduced with OsDRM2 revealed CG and CHG methylation levels of 3.92–9.12%, and 2.88–6.93%, respectively, whereas the mock control S. cerevisiae DNA did not exhibit cytosine methylation. These results were further supported by bisulfite sequencing of the 18S rRNA and EAF5 genes of the transformed S. cerevisiae, which exhibited different DNA methylation patterns, which were observed in the genomic DNA. Our findings establish that OsDRM2 is an active de novo DNA
Functional characterization of a rice de novo DNA methyltransferase, OsDRM2, expressed in Escherichia coli and yeast

International Nuclear Information System (INIS)

Pang, Jinsong; Dong, Mingyue; Li, Ning; Zhao, Yanli; Liu, Bao

2013-01-01

Highlights: ► A rice de novo DNA methyltransferase OsDRM2 was cloned. ► In vitro methylation activity of OsDRM2 was characterized with Escherichia coli. ► Assays of OsDRM2 in vivo methylation were done with Saccharomyces cerevisiae. ► OsDRM2 methylation activity is not preferential to any type of cytosine context. ► The activity of OsDRM2 is independent of RdDM pathway. - Abstract: DNA methylation of cytosine nucleotides is an important epigenetic modification that occurs in most eukaryotic organisms and is established and maintained by various DNA methyltransferases together with their co-factors. There are two major categories of DNA methyltransferases: de novo and maintenance. Here, we report the isolation and functional characterization of a de novo methyltransferase, named OsDRM2, from rice (Oryza sativa L.). The full-length coding region of OsDRM2 was cloned and transformed into Escherichia coli and Saccharomyces cerevisiae. Both of these organisms expressed the OsDRM2 protein, which exhibited stochastic de novo methylation activity in vitro at CG, CHG, and CHH di- and tri-nucleotide patterns. Two lines of evidence demonstrated the de novo activity of OsDRM2: (1) a 5′-CCGG-3′ containing DNA fragment that had been pre-treated with OsDRM2 protein expressed in E. coli was protected from digestion by the CG-methylation-sensitive isoschizomer HpaII; (2) methylation-sensitive amplified polymorphism (MSAP) analysis of S. cerevisiae genomic DNA from transformants that had been introduced with OsDRM2 revealed CG and CHG methylation levels of 3.92–9.12%, and 2.88–6.93%, respectively, whereas the mock control S. cerevisiae DNA did not exhibit cytosine methylation. These results were further supported by bisulfite sequencing of the 18S rRNA and EAF5 genes of the transformed S. cerevisiae, which exhibited different DNA methylation patterns, which were observed in the genomic DNA. Our findings establish that OsDRM2 is an active de novo DNA
MRUniNovo: an efficient tool for de novo peptide sequencing utilizing the hadoop distributed computing framework.

Science.gov (United States)

Li, Chuang; Chen, Tao; He, Qiang; Zhu, Yunping; Li, Kenli

2017-03-15

Tandem mass spectrometry-based de novo peptide sequencing is a complex and time-consuming process. The current algorithms for de novo peptide sequencing cannot rapidly and thoroughly process large mass spectrometry datasets. In this paper, we propose MRUniNovo, a novel tool for parallel de novo peptide sequencing. MRUniNovo parallelizes UniNovo based on the Hadoop compute platform. Our experimental results demonstrate that MRUniNovo significantly reduces the computation time of de novo peptide sequencing without sacrificing the correctness and accuracy of the results, and thus can process very large datasets that UniNovo cannot. MRUniNovo is an open source software tool implemented in java. The source code and the parameter settings are available at http://bioinfo.hupo.org.cn/MRUniNovo/index.php. s131020002@hnu.edu.cn ; taochen1019@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae

Directory of Open Access Journals (Sweden)

Christian J. Michel

2017-12-01

Full Text Available A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C 3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X , using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X , in the complete genome of the yeast Saccharomyces cerevisiae. Several properties of X motifs are identified by basic statistics (at the frequency level, and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R . We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae. We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae, but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions. This property is true for all cardinalities of X motifs (from 4 to 20 and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non- X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

Science.gov (United States)

Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

2017-12-03

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first
Detection of a Usp-like gene in Calotropis procera plant from the de novo assembled genome contigs of the high-throughput sequencing dataset

KAUST Repository

Shokry, Ahmed M.

2014-02-01

The wild plant species Calotropis procera (C. procera) has many potential applications and beneficial uses in medicine, industry and ornamental field. It also represents an excellent source of genes for drought and salt tolerance. Genes encoding proteins that contain the conserved universal stress protein (USP) domain are known to provide organisms like bacteria, archaea, fungi, protozoa and plants with the ability to respond to a plethora of environmental stresses. However, information on the possible occurrence of Usp in C. procera is not available. In this study, we uncovered and characterized a one-class A Usp-like (UspA-like, NCBI accession No. KC954274) gene in this medicinal plant from the de novo assembled genome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for Usp sequences were blasted with the recovered de novo assembled contigs. Homology modelling of the deduced amino acids (NCBI accession No. AGT02387) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera USPA-like full sequence model on Thermus thermophilus USP UniProt protein (PDB accession No. Q5SJV7) was constructed using RasMol and Deep-View programs. The functional domains of the novel USPA-like amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM). © 2014 Académie des sciences.
Porcine lung surfactant protein B gene (SFTPB)

DEFF Research Database (Denmark)

Cirera Salicio, Susanna; Fredholm, Merete

2008-01-01

The porcine surfactant protein B (SFTPB) is a single copy gene on chromosome 3. Three different cDNAs for the SFTPB have been isolated and sequenced. Nucleotide sequence comparison revealed six nonsynonymous single nucleotide polymorphisms (SNPs), four synonymous SNPs and an in-frame deletion of 69...... bp in the region coding for the active protein. Northern analysis showed lung-specific expression of three different isoforms of the SFTPB transcript. The expression level for the SFTPB gene is low in 50 days-old fetus and it increases during lung development. Quantitative real-time polymerase chain...
Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction.

Science.gov (United States)

de Oliveira, Saulo H P; Law, Eleanor C; Shi, Jiye; Deane, Charlotte M

2018-04-01

Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. saulo.deoliveira@dtc.ox.ac.uk. Supplementary data are available at Bioinformatics online.
Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

Directory of Open Access Journals (Sweden)

Hongliang Liu

Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.
Do prion protein gene polymorphisms induce apoptosis in non ...

Indian Academy of Sciences (India)

2016-08-26

Aug 26, 2016 ... Genetic variations such as single nucleotide polymorphisms (SNPs) in prion protein coding gene, Prnp, greatly affect susceptibility to prion diseases in mammals. Here, the coding region of Prnp was screened for polymorphisms in redeared turtle, Trachemys scripta. Four polymorphisms, L203V, N205I, ...
The coevolution of genes and genetic codes: Crick's frozen accident revisited.

Science.gov (United States)

Sella, Guy; Ardell, David H

2006-09-01

The standard genetic code is the nearly universal system for the translation of genes into proteins. The code exhibits two salient structural characteristics: it possesses a distinct organization that makes it extremely robust to errors in replication and translation, and it is highly redundant. The origin of these properties has intrigued researchers since the code was first discovered. One suggestion, which is the subject of this review, is that the code's organization is the outcome of the coevolution of genes and genetic codes. In 1968, Francis Crick explored the possible implications of coevolution at different stages of code evolution. Although he argues that coevolution was likely to influence the evolution of the code, he concludes that it falls short of explaining the organization of the code we see today. The recent application of mathematical modeling to study the effects of errors on the course of coevolution, suggests a different conclusion. It shows that coevolution readily generates genetic codes that are highly redundant and similar in their error-correcting organization to the standard code. We review this recent work and suggest that further affirmation of the role of coevolution can be attained by investigating the extent to which the outcome of coevolution is robust to other influences that were present during the evolution of the code.
Comparative de novo transcriptome analysis of male and female Sea buckthorn.

Science.gov (United States)

Bansal, Ankush; Salaria, Mehul; Sharma, Tashil; Stobdan, Tsering; Kant, Anil

2018-02-01

Sea buckthorn is a dioecious medicinal plant found at high altitude. The plant has both male and female reproductive organs in separate individuals. In this article, whole transcriptome de novo assemblies of male and female flower bud samples were carried out using Illumina NextSeq 500 platform to determine the role of the genes involved in sex determination. Moreover, genes with differential expression in male and female transcriptomes were identified to understand the underlying sex determination mechanism. The current study showed 63,904 and 62,272 coding sequences (CDS) in female and male transcriptome data sets, respectively. 16,831 common CDS were screened out from both transcriptomes, out of which 625 were upregulated and 491 were found to be downregulated. To understand the potential regulatory roles of differentially expressed genes in metabolic networks and biosynthetic pathways: KEGG mapping, gene ontology, and co-expression network analysis were performed. Comparison with Flowering Interactive Database (FLOR-ID) resulted in eight differentially expressed genes viz. CHD3-type chromatin-remodeling factor PICKLE ( PKL ), phytochrome-associated serine/threonine-protein phosphatase ( FYPP ), protein TOPLESS ( TPL ), sensitive to freezing 6 ( SFR6 ), lysine-specific histone demethylase 1 homolog 1 ( LDL1 ), pre-mRNA-processing-splicing factor 8A ( PRP8A ), sucrose synthase 4 ( SUS4 ), ubiquitin carboxyl-terminal hydrolase 12 ( UBP12 ), known to be broadly involved in flowering, photoperiodism, embryo development, and cold response pathways. Male and female flower bud transcriptome data of Sea buckthorn may provide comprehensive information at genomic level for the identification of genetic regulation involved in sex determination.

Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes

DEFF Research Database (Denmark)

Lin, Michael F; Kheradpour, Pouya; Washietl, Stefan

2011-01-01

conservation compared to typical protein-coding genes—especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29......-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ~2% of their synonymous sites. We collect numerous lines of evidence that the observed...... synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian...
Role of horizontal gene transfer as a control on the coevolution of ribosomal proteins and the genetic code

Energy Technology Data Exchange (ETDEWEB)

Woese, Carl R.; Goldenfeld, Nigel; Luthey-Schulten, Zaida

2011-03-31

Our main goal is to develop the conceptual and computational tools necessary to understand the evolution of the universal processes of translation and replication and to identify events of horizontal gene transfer that occurred within the components. We will attempt to uncover the major evolutionary transitions that accompanied the development of protein synthesis by the ribosome and associated components of the translation apparatus. Our project goes beyond standard genomic approaches to explore homologs that are represented at both the structure and sequence level. Accordingly, use of structural phylogenetic analysis allows us to probe further back into deep evolutionary time than competing approaches, permitting greater resolution of primitive folds and structures. Specifically, our work focuses on the elements of translation, ranging from the emergence of the canonical genetic code to the evolution of specific protein folds, mediated by the predominance of horizontal gene transfer in early life. A unique element of this study is the explicit accounting for the impact of phenotype selection on translation, through a coevolutionary control mechanism. Our work contributes to DOE mission objectives through: (1) sophisticated computer simulation of protein dynamics and evolution, and the further refinement of techniques for structural phylogeny, which complement sequence information, leading to improved annotation of genomic databases; (2) development of evolutionary approaches to exploring cellular function and machinery in an integrated way; and (3) documentation of the phenotype interaction with translation over evolutionary time, reflecting the system response to changing selection pressures through horizontal gene transfer.
Heat Shock Protein Genes Undergo Dynamic Alteration in Their Three-Dimensional Structure and Genome Organization in Response to Thermal Stress.

Science.gov (United States)

Chowdhary, Surabhi; Kainth, Amoldeep S; Gross, David S

2017-12-15

Three-dimensional (3D) chromatin organization is important for proper gene regulation, yet how the genome is remodeled in response to stress is largely unknown. Here, we use a highly sensitive version of chromosome conformation capture in combination with fluorescence microscopy to investigate Heat Shock Protein ( HSP ) gene conformation and 3D nuclear organization in budding yeast. In response to acute thermal stress, HSP genes undergo intense intragenic folding interactions that go well beyond 5'-3' gene looping previously described for RNA polymerase II genes. These interactions include looping between upstream activation sequence (UAS) and promoter elements, promoter and terminator regions, and regulatory and coding regions (gene "crumpling"). They are also dynamic, being prominent within 60 s, peaking within 2.5 min, and attenuating within 30 min, and correlate with HSP gene transcriptional activity. With similarly striking kinetics, activated HSP genes, both chromosomally linked and unlinked, coalesce into discrete intranuclear foci. Constitutively transcribed genes also loop and crumple yet fail to coalesce. Notably, a missense mutation in transcription factor TFIIB suppresses gene looping, yet neither crumpling nor HSP gene coalescence is affected. An inactivating promoter mutation, in contrast, obviates all three. Our results provide evidence for widespread, transcription-associated gene crumpling and demonstrate the de novo assembly and disassembly of HSP gene foci. Copyright © 2017 American Society for Microbiology.
Transcriptional regulator-mediated activation of adaptation genes triggers CRISPR de novo spacer acquisition

DEFF Research Database (Denmark)

Liu, Tao; Li, Yingjun; Wang, Xiaodi

2015-01-01

Acquisition of de novo spacer sequences confers CRISPR-Cas with a memory to defend against invading genetic elements. However, the mechanism of regulation of CRISPR spacer acquisition remains unknown. Here we examine the transcriptional regulation of the conserved spacer acquisition genes in Type I......, it was demonstrated that the transcription level of csa1, cas1, cas2 and cas4 was significantly enhanced in a csa3a-overexpression strain and, moreover, the Csa1 and Cas1 protein levels were increased in this strain. Furthermore, we demonstrated the hyperactive uptake of unique spacers within both CRISPR loci...... in the presence of the csa3a overexpression vector. The spacer acquisition process is dependent on the CCN PAM sequence and protospacer selection is random and non-directional. These results suggested a regulation mechanism of CRISPR spacer acquisition where a single transcriptional regulator senses the presence...
A de novo 1q22q23.1 Interstitial Microdeletion in a Girl with Intellectual Disability and Multiple Congenital Anomalies Including Congenital Heart Defect.

Science.gov (United States)

Aleksiūnienė, Beata; Preiksaitiene, Egle; Morkūnienė, Aušra; Ambrozaitytė, Laima; Utkus, Algirdas

2018-01-01

Many studies have shown that molecular karyotyping is an effective diagnostic tool in individuals with developmental delay/intellectual disability. We report on a de novo interstitial 1q22q23.1 microdeletion, 1.6 Mb in size, detected in a patient with short stature, microcephaly, hypoplastic corpus callosum, cleft palate, minor facial anomalies, congenital heart defect, camptodactyly of the 4-5th fingers, and intellectual disability. Chromosomal microarray analysis revealed a 1.6-Mb deletion in the 1q22q23.1 region, arr[GRCh37] 1q22q23.1(155630752_157193893)×1. Real-time PCR analysis confirmed its de novo origin. The deleted region encompasses 50 protein-coding genes, including the morbid genes APOA1BP, ARHGEF2, LAMTOR2, LMNA, NTRK1, PRCC, RIT1, SEMA4A, and YY1AP1. Although the unique phenotype observed in our patient can arise from the haploinsufficiency of the dosage-sensitive LMNA gene, the dosage imbalance of other genes implicated in the rearrangement could also contribute to the phenotype. Further studies are required for the delineation of the phenotype associated with this rare chromosomal alteration and elucidation of the critical genes for manifestation of the specific clinical features. © 2018 S. Karger AG, Basel.
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa.

Science.gov (United States)

Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A

2015-01-01

Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and
Bioinformatic Analysis of Deleterious Non-Synonymous Single Nucleotide Polymorphisms (nsSNPs in the Coding Regions of Human Prion Protein Gene (PRNP

Directory of Open Access Journals (Sweden)

Kourosh Bamdad

2016-12-01

Full Text Available Background & Objective: Single nucleotide polymorphisms are the cause of genetic variation to living organisms. Single nucleotide polymorphisms alter residues in the protein sequence. In this investigation, the relationship between prion protein gene polymorphisms and its relevance to pathogenicity was studied. Material & Method: Amino acid sequence of the main isoform from the human prion protein gene (PRNP was extracted from UniProt database and evaluated by FoldAmyloid and AmylPred servers. All non-synonymous single nucleotide polymorphisms (nsSNPs from SNP database (dbSNP were further analyzed by bioinformatics servers including SIFT, PolyPhen-2, I-Mutant-3.0, PANTHER, SNPs & GO, PHD-SNP, Meta-SNP, and MutPred to determine the most damaging nsSNPs. Results: The results of the first structure analyses by FoldAmyloid and AmylPerd servers implied that regions including 5-15, 174-178, 180-184, 211-217, and 240-252 were the most sensitive parts of the protein sequence to amyloidosis. Screening all nsSNPs of the main protein isoform using bioinformatic servers revealed that substitution of Aspartic acid with Valine at position 178 (ID code: rs11538766 was the most deleterious nsSNP in the protein structure. Conclusion: Substitution of the Aspartic acid with Valine at position 178 (D178V was the most pathogenic mutation in the human prion protein gene. Analyses from the MutPred server also showed that beta-sheets’ increment in the secondary structure was the main reason behind the molecular mechanism of the prion protein aggregation.
Massively parallel de novo protein design for targeted therapeutics

KAUST Repository

Chevalier, Aaron

2017-09-26

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics

KAUST Repository

Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Ferná ndez-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David

2017-01-01

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics

Science.gov (United States)

Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David

2018-01-01

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37–43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing. PMID:28953867
Distinguishing the Transcription Regulation Patterns in Promoters of Human Genes with Different Function or Evolutionary Age

KAUST Repository

Alam, Tanvir

2012-07-01

Distinguishing transcription regulatory patterns of different gene groups is a common problem in various bioinformatics studies. In this work we developed a methodology to deal with such a problem based on machine learning techniques. We applied our method to two biologically important problems related to detecting a difference in transcription regulation of: a/ protein-coding and long non-coding RNAs (lncRNAs) in human, as well as b/ a difference between primate-specific and non-primate-specific long non-coding RNAs. Our method is capable to classify RNAs using various regulatory features of genes that transcribe into these RNAs, such as nucleotide frequencies, transcription factor binding sites, de novo sequence motifs, CpG islands, repetitive elements, histone modification marks, and others. Ten-fold cross-validation tests suggest that our model can distinguish protein-coding and non-coding RNAs with accuracy above 80%. Twenty-fold cross-validation tests suggest that our model can distinguish primate-specific from non-primate-specific promoters of lncRNAs with accuracy above 80%. Consequently, we can hypothesize that transcription of the groups of genes mentioned above are regulated by different mechanisms. Feature selection techniques allowed us to reduce the number of features significantly while keeping the accuracy around 80%. Consequently, we can conclude that selected features play significant role in transcription regulation of coding and non-coding genes, as well as primate-specific and non-primate-specific lncRNA genes.
Analysis of antisense expression by whole genome tiling microarrays and siRNAs suggests mis-annotation of Arabidopsis orphan protein-coding genes.

Directory of Open Access Journals (Sweden)

Casey R Richardson

2010-05-01

Full Text Available MicroRNAs (miRNAs and trans-acting small-interfering RNAs (tasi-RNAs are small (20-22 nt long RNAs (smRNAs generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.We explored rice (Oryza sativa sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis 'orphan' hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the "ancient" (deeply conserved class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for "new" rapidly-evolving MIRNA genes.Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non-coding RNAs in plants and potentially other
De novo assembly, gene annotation, and marker discovery in stored-product pest Liposcelis entomophila (Enderlein using transcriptome sequences.

Directory of Open Access Journals (Sweden)

Dan-Dan Wei

Full Text Available BACKGROUND: As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. METHODOLOGY/PRINCIPAL FINDINGS: We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61% unigenes were matched to known proteins in the NCBI non-redundant (Nr protein database. These unigenes were further functionally annotated with gene ontology (GO, cluster of orthologous groups of proteins (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST genes, 19 putative carboxyl/cholinesterase (CCE genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. CONCLUSIONS/SIGNIFICANCE: We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying
Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Directory of Open Access Journals (Sweden)

Li Weizhong

2008-04-01

Full Text Available Abstract Background The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net. Conclusion The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.
CHIR99021 promotes self-renewal of mouse embryonic stem cells by modulation of protein-encoding gene and long intergenic non-coding RNA expression

Energy Technology Data Exchange (ETDEWEB)

Wu, Yongyan [College of Veterinary Medicine, Northwest A and F University, Yangling 712100, Shaanxi (China); Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi (China); Ai, Zhiying [Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi (China); College of Life Sciences, Northwest A and F University, Yangling 712100, Shaanxi (China); Yao, Kezhen [College of Veterinary Medicine, Northwest A and F University, Yangling 712100, Shaanxi (China); Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi (China); Cao, Lixia; Du, Juan; Shi, Xiaoyan [Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi (China); College of Life Sciences, Northwest A and F University, Yangling 712100, Shaanxi (China); Guo, Zekun, E-mail: gzk@nwsuaf.edu.cn [College of Veterinary Medicine, Northwest A and F University, Yangling 712100, Shaanxi (China); Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi (China); Zhang, Yong, E-mail: zhylab@hotmail.com [College of Veterinary Medicine, Northwest A and F University, Yangling 712100, Shaanxi (China); Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi (China)

2013-10-15

Embryonic stem cells (ESCs) can proliferate indefinitely in vitro and differentiate into cells of all three germ layers. These unique properties make them exceptionally valuable for drug discovery and regenerative medicine. However, the practical application of ESCs is limited because it is difficult to derive and culture ESCs. It has been demonstrated that CHIR99021 (CHIR) promotes self-renewal and enhances the derivation efficiency of mouse (m)ESCs. However, the downstream targets of CHIR are not fully understood. In this study, we identified CHIR-regulated genes in mESCs using microarray analysis. Our microarray data demonstrated that CHIR not only influenced the Wnt/β-catenin pathway by stabilizing β-catenin, but also modulated several other pluripotency-related signaling pathways such as TGF-β, Notch and MAPK signaling pathways. More detailed analysis demonstrated that CHIR inhibited Nodal signaling, while activating bone morphogenetic protein signaling in mESCs. In addition, we found that pluripotency-maintaining transcription factors were up-regulated by CHIR, while several developmental-related genes were down-regulated. Furthermore, we found that CHIR altered the expression of epigenetic regulatory genes and long intergenic non-coding RNAs. Quantitative real-time PCR results were consistent with microarray data, suggesting that CHIR alters the expression pattern of protein-encoding genes (especially transcription factors), epigenetic regulatory genes and non-coding RNAs to establish a relatively stable pluripotency-maintaining network. - Highlights: • Combined use of CHIR with LIF promotes self-renewal of J1 mESCs. • CHIR-regulated genes are involved in multiple pathways. • CHIR inhibits Nodal signaling and promotes Bmp4 expression to activate BMP signaling. • Expression of epigenetic regulatory genes and lincRNAs is altered by CHIR.
CHIR99021 promotes self-renewal of mouse embryonic stem cells by modulation of protein-encoding gene and long intergenic non-coding RNA expression

International Nuclear Information System (INIS)

Wu, Yongyan; Ai, Zhiying; Yao, Kezhen; Cao, Lixia; Du, Juan; Shi, Xiaoyan; Guo, Zekun; Zhang, Yong

2013-01-01

Embryonic stem cells (ESCs) can proliferate indefinitely in vitro and differentiate into cells of all three germ layers. These unique properties make them exceptionally valuable for drug discovery and regenerative medicine. However, the practical application of ESCs is limited because it is difficult to derive and culture ESCs. It has been demonstrated that CHIR99021 (CHIR) promotes self-renewal and enhances the derivation efficiency of mouse (m)ESCs. However, the downstream targets of CHIR are not fully understood. In this study, we identified CHIR-regulated genes in mESCs using microarray analysis. Our microarray data demonstrated that CHIR not only influenced the Wnt/β-catenin pathway by stabilizing β-catenin, but also modulated several other pluripotency-related signaling pathways such as TGF-β, Notch and MAPK signaling pathways. More detailed analysis demonstrated that CHIR inhibited Nodal signaling, while activating bone morphogenetic protein signaling in mESCs. In addition, we found that pluripotency-maintaining transcription factors were up-regulated by CHIR, while several developmental-related genes were down-regulated. Furthermore, we found that CHIR altered the expression of epigenetic regulatory genes and long intergenic non-coding RNAs. Quantitative real-time PCR results were consistent with microarray data, suggesting that CHIR alters the expression pattern of protein-encoding genes (especially transcription factors), epigenetic regulatory genes and non-coding RNAs to establish a relatively stable pluripotency-maintaining network. - Highlights: • Combined use of CHIR with LIF promotes self-renewal of J1 mESCs. • CHIR-regulated genes are involved in multiple pathways. • CHIR inhibits Nodal signaling and promotes Bmp4 expression to activate BMP signaling. • Expression of epigenetic regulatory genes and lincRNAs is altered by CHIR
Purifying selection acts on coding and non-coding sequences of paralogous genes in Arabidopsis thaliana.

Science.gov (United States)

Hoffmann, Robert D; Palmgren, Michael

2016-06-13

Whole-genome duplications in the ancestors of many diverse species provided the genetic material for evolutionary novelty. Several models explain the retention of paralogous genes. However, how these models are reflected in the evolution of coding and non-coding sequences of paralogous genes is unknown. Here, we analyzed the coding and non-coding sequences of paralogous genes in Arabidopsis thaliana and compared these sequences with those of orthologous genes in Arabidopsis lyrata. Paralogs with lower expression than their duplicate had more nonsynonymous substitutions, were more likely to fractionate, and exhibited less similar expression patterns with their orthologs in the other species. Also, lower-expressed genes had greater tissue specificity. Orthologous conserved non-coding sequences in the promoters, introns, and 3' untranslated regions were less abundant at lower-expressed genes compared to their higher-expressed paralogs. A gene ontology (GO) term enrichment analysis showed that paralogs with similar expression levels were enriched in GO terms related to ribosomes, whereas paralogs with different expression levels were enriched in terms associated with stress responses. Loss of conserved non-coding sequences in one gene of a paralogous gene pair correlates with reduced expression levels that are more tissue specific. Together with increased mutation rates in the coding sequences, this suggests that similar forces of purifying selection act on coding and non-coding sequences. We propose that coding and non-coding sequences evolve concurrently following gene duplication.
Natural selection on protein-coding genes in the human genome

DEFF Research Database (Denmark)

Bustamente, Carlos D.; Fledel-Alon, Adi; Williamson, Scott

2005-01-01

, showing an excess of deleterious variation within local populations 9, 10 . Here we contrast patterns of coding sequence polymorphism identified by direct sequencing of 39 humans for over 11,000 genes to divergence between humans and chimpanzees, and find strong evidence that natural selection has shaped......Comparisons of DNA polymorphism within species to divergence between species enables the discovery of molecular adaptation in evolutionarily constrained genes as well as the differentiation of weak from strong purifying selection 1, 2, 3, 4 . The extent to which weak negative and positive darwinian...... selection have driven the molecular evolution of different species varies greatly 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 , with some species, such as Drosophila melanogaster, showing strong evidence of pervasive positive selection 6, 7, 8, 9 , and others, such as the selfing weed Arabidopsis thaliana...
De novo transcriptome assembly, functional annotation and differential gene expression analysis of juvenile and adult E. fetida, a model oligochaete used in ecotoxicological studies

Directory of Open Access Journals (Sweden)

Michelle Thunders

Full Text Available Abstract Background Earthworms are sensitive to toxic chemicals present in the soil and so are useful indicator organisms for soil health. Eisenia fetida are commonly used in ecotoxicological studies; therefore the assembly of a baseline transcriptome is important for subsequent analyses exploring the impact of toxin exposure on genome wide gene expression. Results This paper reports on the de novo transcriptome assembly of E. fetida using Trinity, a freely available software tool. Trinotate was used to carry out functional annotation of the Trinity generated transcriptome file and the transdecoder generated peptide sequence file along with BLASTX, BLASTP and HMMER searches and were loaded into a Sqlite3 database. To identify differentially expressed transcripts; each of the original sequence files were aligned to the de novo assembled transcriptome using Bowtie and then RSEM was used to estimate expression values based on the alignment. EdgeR was used to calculate differential expression between the two conditions, with an FDR corrected P value cut off of 0.001, this returned six significantly differentially expressed genes. Initial BLASTX hits of these putative genes included hits with annelid ferritin and lysozyme proteins, as well as fungal NADH cytochrome b5 reductase and senescence associated proteins. At a cut off of P = 0.01 there were a further 26 differentially expressed genes. Conclusion These data have been made publicly available, and to our knowledge represent the most comprehensive available transcriptome for E. fetida assembled from RNA sequencing data. This provides important groundwork for subsequent ecotoxicogenomic studies exploring the impact of the environment on global gene expression in E. fetida and other earthworm species.
Nucleotide sequence of the gene coding for human factor VII, a vitamin K-dependent protein participating in blood coagulation

International Nuclear Information System (INIS)

O'Hara, P.J.; Grant, F.J.; Haldeman, B.A.; Gray, C.L.; Insley, M.Y.; Hagen, F.S.; Murray, M.J.

1987-01-01

Activated factor VII (factor VIIa) is a vitamin K-dependent plasma serine protease that participates in a cascade of reactions leading to the coagulation of blood. Two overlapping genomic clones containing sequences encoding human factor VII were isolated and characterized. The complete sequence of the gene was determined and found to span about 12.8 kilobases. The mRNA for factor VII as demonstrated by cDNA cloning is polyadenylylated at multiple sites but contains only one AAUAAA poly(A) signal sequence. The mRNA can undergo alternative splicing, forming one transcript containing eight segments as exons and another with an additional exon that encodes a larger prepro leader sequence. The latter transcript has no known counterpart in the other vitamin K-dependent proteins. The positions of the introns with respect to the amino acid sequence encoded by the eight essential exons of factor VII are the same as those present in factor IX, factor X, protein C, and the first three exons of prothrombin. These exons code for domains generally conserved among members of this gene family. The comparable introns in these genes, however, are dissimilar with respect to size and sequence, with the exception of intron C in factor VII and protein C. The gene for factor VII also contains five regions made up of tandem repeats of oligonucleotide monomer elements. More than a quarter of the intron sequences and more than a third of the 3' untranslated portion of the mRNA transcript consist of these minisatellite tandem repeats

De Novo Assembly of the Pea (Pisum sativum L. Nodule Transcriptome

Directory of Open Access Journals (Sweden)

Vladimir A. Zhukov

2015-01-01

Full Text Available The large size and complexity of the garden pea (Pisum sativum L. genome hamper its sequencing and the discovery of pea gene resources. Although transcriptome sequencing provides extensive information about expressed genes, some tissue-specific transcripts can only be identified from particular organs under appropriate conditions. In this study, we performed RNA sequencing of polyadenylated transcripts from young pea nodules and root tips on an Illumina GAIIx system, followed by de novo transcriptome assembly using the Trinity program. We obtained more than 58,000 and 37,000 contigs from “Nodules” and “Root Tips” assemblies, respectively. The quality of the assemblies was assessed by comparison with pea expressed sequence tags and transcriptome sequencing project data available from NCBI website. The “Nodules” assembly was compared with the “Root Tips” assembly and with pea transcriptome sequencing data from projects indicating tissue specificity. As a result, approximately 13,000 nodule-specific contigs were found and annotated by alignment to known plant protein-coding sequences and by Gene Ontology searching. Of these, 581 sequences were found to possess full CDSs and could thus be considered as novel nodule-specific transcripts of pea. The information about pea nodule-specific gene sequences can be applied for gene-based markers creation, polymorphism studies, and real-time PCR.
Mass Spectrometry Analysis Coupled with de novo Sequencing Reveals Amino Acid Substitutions in Nucleocapsid Protein from Influenza A Virus

Directory of Open Access Journals (Sweden)

Zijian Li

2014-02-01

Full Text Available Amino acid substitutions in influenza A virus are the main reasons for both antigenic shift and virulence change, which result from non-synonymous mutations in the viral genome. Nucleocapsid protein (NP, one of the major structural proteins of influenza virus, is responsible for regulation of viral RNA synthesis and replication. In this report we used LC-MS/MS to analyze tryptic digestion of nucleocapsid protein of influenza virus (A/Puerto Rico/8/1934 H1N1, which was isolated and purified by SDS poly-acrylamide gel electrophoresis. Thus, LC-MS/MS analyses, coupled with manual de novo sequencing, allowed the determination of three substituted amino acid residues R452K, T423A and N430T in two tryptic peptides. The obtained results provided experimental evidence that amino acid substitutions resulted from non-synonymous gene mutations could be directly characterized by mass spectrometry in proteins of RNA viruses such as influenza A virus.
Annotation of the protein coding regions of the equine genome

DEFF Research Database (Denmark)

Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

2015-01-01

Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...
The Folding of de Novo Designed Protein DS119 via Molecular Dynamics Simulations

Directory of Open Access Journals (Sweden)

Moye Wang

2016-04-01

Full Text Available As they are not subjected to natural selection process, de novo designed proteins usually fold in a manner different from natural proteins. Recently, a de novo designed mini-protein DS119, with a βαβ motif and 36 amino acids, has folded unusually slowly in experiments, and transient dimers have been detected in the folding process. Here, by means of all-atom replica exchange molecular dynamics (REMD simulations, several comparably stable intermediate states were observed on the folding free-energy landscape of DS119. Conventional molecular dynamics (CMD simulations showed that when two unfolded DS119 proteins bound together, most binding sites of dimeric aggregates were located at the N-terminal segment, especially residues 5–10, which were supposed to form β-sheet with its own C-terminal segment. Furthermore, a large percentage of individual proteins in the dimeric aggregates adopted conformations similar to those in the intermediate states observed in REMD simulations. These results indicate that, during the folding process, DS119 can easily become trapped in intermediate states. Then, with diffusion, a transient dimer would be formed and stabilized with the binding interface located at N-terminals. This means that it could not quickly fold to the native structure. The complicated folding manner of DS119 implies the important influence of natural selection on protein-folding kinetics, and more improvement should be achieved in rational protein design.
Transduplication resulted in the incorporation of two protein-coding sequences into the Turmoil-1 transposable element of C. elegans

Directory of Open Access Journals (Sweden)

Pupko Tal

2008-10-01

Full Text Available Abstract Transposable elements may acquire unrelated gene fragments into their sequences in a process called transduplication. Transduplication of protein-coding genes is common in plants, but is unknown of in animals. Here, we report that the Turmoil-1 transposable element in C. elegans has incorporated two protein-coding sequences into its inverted terminal repeat (ITR sequences. The ITRs of Turmoil-1 contain a conserved RNA recognition motif (RRM that originated from the rsp-2 gene and a fragment from the protein-coding region of the cpg-3 gene. We further report that an open reading frame specific to C. elegans may have been created as a result of a Turmoil-1 insertion. Mutations at the 5' splice site of this open reading frame may have reactivated the transduplicated RRM motif. Reviewers This article was reviewed by Dan Graur and William Martin. For the full reviews, please go to the Reviewers' Reports section.
Revised Mimivirus major capsid protein sequence reveals intron-containing gene structure and extra domain

Directory of Open Access Journals (Sweden)

Suzan-Monti Marie

2009-05-01

Full Text Available Abstract Background Acanthamoebae polyphaga Mimivirus (APM is the largest known dsDNA virus. The viral particle has a nearly icosahedral structure with an internal capsid shell surrounded with a dense layer of fibrils. A Capsid protein sequence, D13L, was deduced from the APM L425 coding gene and was shown to be the most abundant protein found within the viral particle. However this protein remained poorly characterised until now. A revised protein sequence deposited in a database suggested an additional N-terminal stretch of 142 amino acids missing from the original deduced sequence. This result led us to investigate the L425 gene structure and the biochemical properties of the complete APM major Capsid protein. Results This study describes the full length 3430 bp Capsid coding gene and characterises the 593 amino acids long corresponding Capsid protein 1. The recombinant full length protein allowed the production of a specific monoclonal antibody able to detect the Capsid protein 1 within the viral particle. This protein appeared to be post-translationnally modified by glycosylation and phosphorylation. We proposed a secondary structure prediction of APM Capsid protein 1 compared to the Capsid protein structure of Paramecium Bursaria Chlorella Virus 1, another member of the Nucleo-Cytoplasmic Large DNA virus family. Conclusion The characterisation of the full length L425 Capsid coding gene of Acanthamoebae polyphaga Mimivirus provides new insights into the structure of the main Capsid protein. The production of a full length recombinant protein will be useful for further structural studies.
De novo transcriptome assembly of a sour cherry cultivar, Schattenmorelle

Directory of Open Access Journals (Sweden)

Yeonhwa Jo

2015-12-01

Full Text Available Sour cherry (Prunus cerasus in the genus Prunus in the family Rosaceae is one of the most popular stone fruit trees worldwide. Of known sour cherry cultivars, the Schattenmorelle is a famous old sour cherry with a high amount of fruit production. The Schattenmorelle was selected before 1650 and described in the 1800s. This cultivar was named after gardens of the Chateau de Moreille in which the cultivar was initially found. In order to identify new genes and to develop genetic markers for sour cherry, we performed a transcriptome analysis of a sour cherry. We selected the cultivar Schattenmorelle, which is among commercially important cultivars in Europe and North America. We obtained 2.05 GB raw data from the Schattenmorelle (NCBI accession number: SRX1187170. De novo transcriptome assembly using Trinity identified 61,053 transcripts in which N50 was 611 bp. Next, we identified 25,585 protein coding sequences using TransDecoder. The identified proteins were blasted against NCBI's non-redundant database for annotation. Based on blast search, we taxonomically classified the obtained sequences. As a result, we provide the transcriptome of sour cherry cultivar Schattenmorelle using next generation sequencing.
Single nucleotide polymorphisms (SNPs in coding regions of canine dopamine- and serotonin-related genes

Directory of Open Access Journals (Sweden)

Lingaas Frode

2008-01-01

Full Text Available Abstract Background Polymorphism in genes of regulating enzymes, transporters and receptors of the neurotransmitters of the central nervous system have been associated with altered behaviour, and single nucleotide polymorphisms (SNPs represent the most frequent type of genetic variation. The serotonin and dopamine signalling systems have a central influence on different behavioural phenotypes, both of invertebrates and vertebrates, and this study was undertaken in order to explore genetic variation that may be associated with variation in behaviour. Results Single nucleotide polymorphisms in canine genes related to behaviour were identified by individually sequencing eight dogs (Canis familiaris of different breeds. Eighteen genes from the dopamine and the serotonin systems were screened, revealing 34 SNPs distributed in 14 of the 18 selected genes. A total of 24,895 bp coding sequence was sequenced yielding an average frequency of one SNP per 732 bp (1/732. A total of 11 non-synonymous SNPs (nsSNPs, which may be involved in alteration of protein function, were detected. Of these 11 nsSNPs, six resulted in a substitution of amino acid residue with concomitant change in structural parameters. Conclusion We have identified a number of coding SNPs in behaviour-related genes, several of which change the amino acids of the proteins. Some of the canine SNPs exist in codons that are evolutionary conserved between five compared species, and predictions indicate that they may have a functional effect on the protein. The reported coding SNP frequency of the studied genes falls within the range of SNP frequencies reported earlier in the dog and other mammalian species. Novel SNPs are presented and the results show a significant genetic variation in expressed sequences in this group of genes. The results can contribute to an improved understanding of the genetics of behaviour.
Analysis of mutations in the entire coding sequence of the factor VIII gene

Energy Technology Data Exchange (ETDEWEB)

Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M. [Glascow Univ. (United Kingdom)] [and others

1994-09-01

Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.
De novo protein structure generation from incomplete chemical shift assignments

Energy Technology Data Exchange (ETDEWEB)

Shen Yang [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States); Vernon, Robert; Baker, David [University of Washington, Department of Biochemistry and Howard Hughes Medical Institute (United States); Bax, Ad [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)], E-mail: bax@nih.gov

2009-02-15

NMR chemical shifts provide important local structural information for proteins. Consistent structure generation from NMR chemical shift data has recently become feasible for proteins with sizes of up to 130 residues, and such structures are of a quality comparable to those obtained with the standard NMR protocol. This study investigates the influence of the completeness of chemical shift assignments on structures generated from chemical shifts. The Chemical-Shift-Rosetta (CS-Rosetta) protocol was used for de novo protein structure generation with various degrees of completeness of the chemical shift assignment, simulated by omission of entries in the experimental chemical shift data previously used for the initial demonstration of the CS-Rosetta approach. In addition, a new CS-Rosetta protocol is described that improves robustness of the method for proteins with missing or erroneous NMR chemical shift input data. This strategy, which uses traditional Rosetta for pre-filtering of the fragment selection process, is demonstrated for two paramagnetic proteins and also for two proteins with solid-state NMR chemical shift assignments.
De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

Science.gov (United States)

Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

2015-09-21

Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.
Transferência do fator caturra para o cultivar Mundo Novo de Coffea arabica Transfer of the CT gene to Mundo Novo cultivar

Directory of Open Access Journals (Sweden)

A. Carvalho

1972-01-01

Full Text Available No presente trabalho são relatados os estudos realizados visando à introdução do gene Ct (caturra que contribui para reduzir a altura da planta, no cultivar Mundo" Novo de Coffea arabica.Estudaram-se, em ensaios de produtividade, as populações Fv F.,, F3 e F4. Nessas populações e principalmente entre os descendentes dos "caféeiros H 2077-2-5 e H 2077-2-12, foram selecionadas plantas homozigotas para os alelos Ct e também para os alelos responsáveis pela cor do fruto xc ou Xc. Essas combinações foram denominadas 'Catuaí Amarelo' e 'Catuaí Vermelho', respectivamente, e suas características são apresentadas. Os novos cultivares vêm-se mostrando de interesse econômico para as regiões cafeeiras não somente pelo porte pequeno, mas também pela produtividade, pelo vigor vegetativo e pela precocidade.The successful transfer of the Ct gene for short internode to the tall cultivar of Coffea arábica'Mundo Novo' is reported. Individual selections were carried out in the F1, F2, F3 and F4 generations. It was found that early selection in the F2 generation was quite effective. A remarkably good correlation was found between productitivity of F2 plants and the yield of the F3 and F4 generations. Plants of the F4 generation have shown reasonable uniformity and high yield in several trials. The new selections showed to be early producers. Two new cultivars were released namely 'Catuaí Amarelo' and 'Catuaí Vermelho'. The former has yellow fruits whereas the latter has red fruits. The plants are much shorter that the ones of Mundo Novo. The new cultivars have a very strong secondary and tertiary branching. Because of these characteristics Catuaí Amarelo and Catuaí Vermelho are being planted in large scale replacing the tall cultivars.
Expression of the Long Intergenic Non-Protein Coding RNA 665 (LINC00665) Gene and the Cell Cycle in Hepatocellular Carcinoma Using The Cancer Genome Atlas, the Gene Expression Omnibus, and Quantitative Real-Time Polymerase Chain Reaction.

Science.gov (United States)

Wen, Dong-Yue; Lin, Peng; Pang, Yu-Yan; Chen, Gang; He, Yun; Dang, Yi-Wu; Yang, Hong

2018-05-05

BACKGROUND Long non-coding RNAs (lncRNAs) have a role in physiological and pathological processes, including cancer. The aim of this study was to investigate the expression of the long intergenic non-protein coding RNA 665 (LINC00665) gene and the cell cycle in hepatocellular carcinoma (HCC) using database analysis including The Cancer Genome Atlas (TCGA), the Gene Expression Omnibus (GEO), and quantitative real-time polymerase chain reaction (qPCR). MATERIAL AND METHODS Expression levels of LINC00665 were compared between human tissue samples of HCC and adjacent normal liver, clinicopathological correlations were made using TCGA and the GEO, and qPCR was performed to validate the findings. Other public databases were searched for other genes associated with LINC00665 expression, including The Atlas of Noncoding RNAs in Cancer (TANRIC), the Multi Experiment Matrix (MEM), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and protein-protein interaction (PPI) networks. RESULTS Overexpression of LINC00665 in patients with HCC was significantly associated with gender, tumor grade, stage, and tumor cell type. Overexpression of LINC00665 in patients with HCC was significantly associated with overall survival (OS) (HR=1.47795%; CI: 1.046-2.086). Bioinformatics analysis identified 469 related genes and further analysis supported a hypothesis that LINC00665 regulates pathways in the cell cycle to facilitate the development and progression of HCC through ten identified core genes: CDK1, BUB1B, BUB1, PLK1, CCNB2, CCNB1, CDC20, ESPL1, MAD2L1, and CCNA2. CONCLUSIONS Overexpression of the lncRNA, LINC00665 may be involved in the regulation of cell cycle pathways in HCC through ten identified hub genes.
De Novo Nodal Diffuse Large B-Cell Lymphoma: Identification of Biologic Prognostic Factors

International Nuclear Information System (INIS)

Abd El-Hameed, A.

2005-01-01

Diffuse large B-cell Lymphoma (DLBCL) represents the most frequent type of non-Hodgkin lymphoma (NHL). Although combination chemotherapy has improved the outcome, long-term cure is now possible for approximately 50% of all patients. making the search for parameters identifying patients at high risk particularly needed. The presence of bcl-2 gene rearrangement in de novo DLBCL suggests a possible follicle center cell origin and perhaps a distinct clinical behavior. This study investigated the frequency and prognostic significance of t( 14; 18) translocation and bcl-2 protein overexpression in a cohort of patients with de novo nodal DLBCL who where uniformly evaluated and treated. Material and Methods: A total of 40 patients with de novo nodal DLBCL treated at National Cancer Institute (NCI), Cairo University were investigated. Formal infixed, paraffin-embedded sections were analyzed for: I) bcl-2 gene rearrangement including major break point region (mbr) and minor cluster region (mcr) by polymerase chain reaction (PCR). and 2) bcl-2 protein expression by immunohistochemistry using Dako 124 clone. Results were correlated with the clinical features and subsequent clinical course. Bcl-2 gene rearrangement was detected in 8 cases (20%). 2 cases at mbr, and 6 cases at mcr. Bcl-2 protein (> I 0%) was expressed in 24 cases (60%), irrespective of the presence of t( 14; 18) translocation. The t( 14; 18), and bcl-2 protein overexpression were more frequently associated with failure to achieve a complete response to therapy (ρ=0.008. and 0.04. respectively). DLBCL patients with t(14;18), and bcl-2 protein expression had a significantly reduced 5-year disease free survival (ρ=0.04, and 0.01, respectively). The t( 14; 18) translocation, and bcl-2 protein expression define a group of DLBCL patients with a poor prognosis, and could be used to tailor treatment, and to identify candidates for therapeutic approaches. Geographic differences in t(14;18) may be related to the
Fast rate of evolution in alternatively spliced coding regions of mammalian genes

Directory of Open Access Journals (Sweden)

Nurtdinov Ramil N

2006-04-01

Full Text Available Abstract Background At least half of mammalian genes are alternatively spliced. Alternative isoforms are often genome-specific and it has been suggested that alternative splicing is one of the major mechanisms for generating protein diversity in the course of evolution. Another way of looking at alternative splicing is to consider sequence evolution of constitutive and alternative regions of protein-coding genes. Indeed, it turns out that constitutive and alternative regions evolve in different ways. Results A set of 3029 orthologous pairs of human and mouse alternatively spliced genes was considered. The rate of nonsynonymous substitutions (dN, the rate of synonymous substitutions (dS, and their ratio (ω = dN/dS appear to be significantly higher in alternatively spliced coding regions compared to constitutive regions. When N-terminal, internal and C-terminal alternatives are analysed separately, C-terminal alternatives appear to make the main contribution to the observed difference. The effects become even more pronounced in a subset of fast evolving genes. Conclusion These results provide evidence of weaker purifying selection and/or stronger positive selection in alternative regions and thus one more confirmation of accelerated evolution in alternative regions. This study corroborates the theory that alternative splicing serves as a testing ground for molecular evolution.
The water-borne protein signals (pheromones) of the Antarctic ciliated protozoan Euplotes nobilii: structure of the gene coding for the En-6 pheromone.

Science.gov (United States)

La Terza, Antonietta; Dobri, Nicoleta; Alimenti, Claudio; Vallesi, Adriana; Luporini, Pierangelo

2009-01-01

The marine Antarctic ciliate, Euplotes nobilii, secretes a family of water-borne signal proteins, denoted as pheromones, which control vegetative proliferation and mating in the cell. Based on the knowledge of the amino acid sequences of a set of these pheromones isolated from the culture supernatant of wild-type strains, we designed probes to identify their encoding genes in the cell somatic nucleus (macronucleus). The full-length gene of the pheromone En-6 was determined and found to contain an open-reading frame specific for the synthesis of the En-6 cytoplasmic precursor (pre-pro-En-6), which requires 2 proteolytic cleavages to remove the signal peptide (pre) and the prosegment before secretion of the mature protein. In contrast to the sequence variability that distinguishes the secreted pheromones, the pre- and pro-sequences appear to be tightly conserved and useful for the construction of probes to clone every other E. nobilii pheromone gene. Potential intron sequences in the coding region of the En-6 gene imply the synthesis of more En-6 isoforms.
The artificial zinc finger coding gene 'Jazz' binds the utrophin promoter and activates transcription.

Science.gov (United States)

Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C

2000-06-01

Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.

Science.gov (United States)

O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F

2011-07-01

To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
A de novo designed monomeric, compact three helix bundle protein on a carbohydrate template

DEFF Research Database (Denmark)

Malik, Leila; Nygård, Jesper; Christensen, Niels Johan

2015-01-01

De novo design and chemical synthesis of proteins and of other artificial structures, which mimic them, is a central strategy for understanding protein folding and for accessing proteins with novel functions. We have previously described carbohydrates as templates for the assembly of artificial...... the template could facilitate protein folding. Here we report the design and synthesis of 3-helix bundle carboproteins on deoxy-hexopyranosides. The carboproteins were analyzed by CD, AUC, SAXS, and NMR, which revealed the formation of the first compact, and folded monomeric carboprotein distinctly different...
Obrigações empresariais no Novo Código Civil Corporate law and the New Brazilian Civil Code

Directory of Open Access Journals (Sweden)

Ligia Paula Pires Pinto Sica

2008-06-01

Full Text Available Tendo em vista a promulgação do novo código civil brasileiro, que reúne dispositivos que revogam o antigo código civil de 1916 e a maioria dos capítulos do código comercial de 1850, unificando-os, é importante que se frise que remanesce a diferenciação entre as matérias de direito civil e comercial, de acordo com suas lógicas peculiares. Sendo assim e tendo o novo código introduzido diversas normas de caráter geral, este trabalho pretende discutir o papel do juiz e da jurisprudência na aplicação dessas normas de maneira casuística, dando-lhes tratamentos distintos de acordo com os fatos apresentados em juízo, de forma a manter a autonomia das áreas do direito mencionadas e garantir aos agentes econômicos o grau de segurança e previsibilidade necessário às suas atuações no mercado.In regard of the enactment of the New Brazilian Civil Code, that unifies the issues treated in the old civil code from 1916 and on the majority of the chapters of the commercial code from 1850, it's important to insist that the differences between the civil and commercial law remains, according to their peculiar logics. Asitis, and as the new code brought several rules of general character, this paper intends to discuss the role of the judge and jurisprudence in the civil law system, by interpretating those rules in a casuistic manner, giving them different treatments, according to the presented facts during litigation, in a way to maintain the autonomy of the law areas mentioned above and guarantee to the economic agents the level of certainty and previsibility, needed to exercise their activities in the market.

The Arabidopsis TOR Kinase Specifically Regulates the Expression of Nuclear Genes Coding for Plastidic Ribosomal Proteins and the Phosphorylation of the Cytosolic Ribosomal Protein S6.

Science.gov (United States)

Dobrenel, Thomas; Mancera-Martínez, Eder; Forzani, Céline; Azzopardi, Marianne; Davanture, Marlène; Moreau, Manon; Schepetilnikov, Mikhail; Chicher, Johana; Langella, Olivier; Zivy, Michel; Robaglia, Christophe; Ryabova, Lyubov A; Hanson, Johannes; Meyer, Christian

2016-01-01

Protein translation is an energy consuming process that has to be fine-tuned at both the cell and organism levels to match the availability of resources. The target of rapamycin kinase (TOR) is a key regulator of a large range of biological processes in response to environmental cues. In this study, we have investigated the effects of TOR inactivation on the expression and regulation of Arabidopsis ribosomal proteins at different levels of analysis, namely from transcriptomic to phosphoproteomic. TOR inactivation resulted in a coordinated down-regulation of the transcription and translation of nuclear-encoded mRNAs coding for plastidic ribosomal proteins, which could explain the chlorotic phenotype of the TOR silenced plants. We have identified in the 5' untranslated regions (UTRs) of this set of genes a conserved sequence related to the 5' terminal oligopyrimidine motif, which is known to confer translational regulation by the TOR kinase in other eukaryotes. Furthermore, the phosphoproteomic analysis of the ribosomal fraction following TOR inactivation revealed a lower phosphorylation of the conserved Ser240 residue in the C-terminal region of the 40S ribosomal protein S6 (RPS6). These results were confirmed by Western blot analysis using an antibody that specifically recognizes phosphorylated Ser240 in RPS6. Finally, this antibody was used to follow TOR activity in plants. Our results thus uncover a multi-level regulation of plant ribosomal genes and proteins by the TOR kinase.
De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response.

Science.gov (United States)

Gui, Duan; Jia, Kuntong; Xia, Jia; Yang, Lili; Chen, Jialin; Wu, Yuping; Yi, Meisheng

2013-01-01

The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-valueIndo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.
The induction of the oxidative burst in Elodea densa by sulfhydryl reagent does not depend on de novo protein synthesis

Energy Technology Data Exchange (ETDEWEB)

Amicucci, Enrica [Milan, Univ. (Italy). Dipt. di Fisiologia e Biochimica delle Piante

1997-12-31

In Elodea densa Planchon leaves, N-ethylmaleimide (NEM) and other sulfhydryl-binding reagents induce a marked and temporary increase of respiration that is insensitive to cyanide, hydroxamate and propylgallate and completely inhibited by diphenylene iodonium (DPI) and by quinacrine. In this paper the author investigates whether the mechanism that causes the oxidative burst depends on the activation of preexisting oxidative systems or on the activation of de novo protein synthesis. The inhibitors used were cycloheximide (CHI) which inhibits protein synthesis in plant cells by depressing the incorporation of aminoacids into proteins and cordycepin, an effective inhibitor of mRNA synthesis. The data support the idea that the mechanism investigated depends on the activation of a long lived protein(s) and not on de novo protein synthesis.
Cloning of human genes encoding novel G protein-coupled receptors

Energy Technology Data Exchange (ETDEWEB)

Marchese, A.; Docherty, J.M.; Heiber, M. [Univ. of Toronto, (Canada)] [and others

1994-10-01

We report the isolation and characterization of several novel human genes encoding G protein-coupled receptors. Each of the receptors contained the familiar seven transmembrane topography and most closely resembled peptide binding receptors. Gene GPR1 encoded a receptor protein that is intronless in the coding region and that shared identity (43% in the transmembrane regions) with the opioid receptors. Northern blot analysis revealed that GPR1 transcripts were expressed in the human hippocampus, and the gene was localized to chromosome 15q21.6. Gene GPR2 encoded a protein that most closely resembled an interleukin-8 receptor (51% in the transmembrane regions), and this gene, not expressed in the six brain regions examined, was localized to chromosome 17q2.1-q21.3. A third gene, GPR3, showed identity (56% in the transmembrane regions) with a previously characterized cDNA clone from rat and was localized to chromosome 1p35-p36.1. 31 refs., 5 figs., 1 tab.
De novo 14q24.2q24.3 microdeletion including IFT43 is associated with intellectual disability, skeletal anomalies, cardiac anomalies, and myopia.

Science.gov (United States)

Stokman, Marijn F; Oud, Machteld M; van Binsbergen, Ellen; Slaats, Gisela G; Nicolaou, Nayia; Renkema, Kirsten Y; Nijman, Isaac J; Roepman, Ronald; Giles, Rachel H; Arts, Heleen H; Knoers, Nine V A M; van Haelst, Mieke M

2016-06-01

We report an 11-year-old girl with mild intellectual disability, skeletal anomalies, congenital heart defect, myopia, and facial dysmorphisms including an extra incisor, cup-shaped ears, and a preauricular skin tag. Array comparative genomic hybridization analysis identified a de novo 4.5-Mb microdeletion on chromosome 14q24.2q24.3. The deleted region and phenotype partially overlap with previously reported patients. Here, we provide an overview of the literature on 14q24 microdeletions and further delineate the associated phenotype. We performed exome sequencing to examine other causes for the phenotype and queried genes present in the 14q24.2q24.3 microdeletion that are associated with recessive disease for variants in the non-deleted allele. The deleted region contains 65 protein-coding genes, including the ciliary gene IFT43. Although Sanger and exome sequencing did not identify variants in the second IFT43 allele or in other IFT complex A-protein-encoding genes, immunocytochemistry showed increased accumulation of IFT-B proteins at the ciliary tip in patient-derived fibroblasts compared to control cells, demonstrating defective retrograde ciliary transport. This could suggest a ciliary defect in the pathogenesis of this disorder. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Cloning and identification of the gene coding for the 140-kd subunit of Drosophila RNA polymerase II

OpenAIRE

Faust, Daniela M.; Renkawitz-Pohl, Renate; Falkenburg, Dieter; Gasch, Alexander; Bialojan, Siegfried; Young, Richard A.; Bautz, Ekkehard K. F.

1986-01-01

Genomic clones of Drosophila melanogaster were isolated from a λ library by cross-hybridization with the yeast gene coding for the 150-kd subunit of RNA polymerase II. Clones containing a region of ∼2.0 kb with strong homology to the yeast gene were shown to code for a 3.9-kb poly(A)+-RNA. Part of the coding region was cloned into an expression vector. A fusion protein was obtained which reacted with an antibody directed against RNA polymerase II of Drosophila. Peptide mapping of the fusion p...
De novo transcriptome sequencing of Isaria cateniannulata and comparative analysis of gene expression in response to heat and cold stresses.

Directory of Open Access Journals (Sweden)

Dingfeng Wang

Full Text Available Isaria cateniannulata is a very important and virulent entomopathogenic fungus that infects many insect pest species. Although I. cateniannulata is commonly exposed to extreme environmental temperature conditions, little is known about its molecular response mechanism to temperature stress. Here, we sequenced and de novo assembled the transcriptome of I. cateniannulata in response to high and low temperature stresses using Illumina RNA-Seq technology. Our assembly encompassed 17,514 unigenes (mean length = 1,197 bp, in which 11,445 unigenes (65.34% showed significant similarities to known sequences in NCBI non-redundant protein sequences (Nr database. Using digital gene expression analysis, 4,483 differentially expressed genes (DEGs were identified after heat treatment, including 2,905 up-regulated genes and 1,578 down-regulated genes. Under cold stress, 1,927 DEGs were identified, including 1,245 up-regulated genes and 682 down-regulated genes. The expression patterns of 18 randomly selected candidate DEGs resulting from quantitative real-time PCR (qRT-PCR were consistent with their transcriptome analysis results. Although DEGs were involved in many pathways, we focused on the genes that were involved in endocytosis: In heat stress, the pathway of clathrin-dependent endocytosis (CDE was active; however at low temperature stresses, the pathway of clathrin-independent endocytosis (CIE was active. Besides, four categories of DEGs acting as temperature sensors were observed, including cell-wall-major-components-metabolism-related (CWMCMR genes, heat shock protein (Hsp genes, intracellular-compatible-solutes-metabolism-related (ICSMR genes and glutathione S-transferase (GST. These results enhance our understanding of the molecular mechanisms of I. cateniannulata in response to temperature stresses and provide a valuable resource for the future investigations.
De Novo Transcriptome Sequencing of Desert Herbaceous Achnatherum splendens (Achnatherum Seedlings and Identification of Salt Tolerance Genes

Directory of Open Access Journals (Sweden)

Jiangtao Liu

2016-03-01

Full Text Available Achnatherum splendens is an important forage herb in Northwestern China. It has a high tolerance to salinity and is, thus, considered one of the most important constructive plants in saline and alkaline areas of land in Northwest China. However, the mechanisms of salt stress tolerance in A. splendens remain unknown. Next-generation sequencing (NGS technologies can be used for global gene expression profiling. In this study, we examined sequence and transcript abundance data for the root/leaf transcriptome of A. splendens obtained using an Illumina HiSeq 2500. Over 35 million clean reads were obtained from the leaf and root libraries. All of the RNA sequencing (RNA-seq reads were assembled de novo into a total of 126,235 unigenes and 36,511 coding DNA sequences (CDS. We further identified 1663 differentially-expressed genes (DEGs between the salt stress treatment and control. Functional annotation of the DEGs by gene ontology (GO, using Arabidopsis and rice as references, revealed enrichment of salt stress-related GO categories, including “oxidation reduction”, “transcription factor activity”, and “ion channel transporter”. Thus, this global transcriptome analysis of A. splendens has provided an important genetic resource for the study of salt tolerance in this halophyte. The identified sequences and their putative functional data will facilitate future investigations of the tolerance of Achnatherum species to various types of abiotic stress.
Computational Tools and Algorithms for Designing Customized Synthetic Genes

Energy Technology Data Exchange (ETDEWEB)

Gould, Nathan [Department of Computer Science, The College of New Jersey, Ewing, NJ (United States); Hendy, Oliver [Department of Biology, The College of New Jersey, Ewing, NJ (United States); Papamichail, Dimitris, E-mail: papamicd@tcnj.edu [Department of Computer Science, The College of New Jersey, Ewing, NJ (United States)

2014-10-06

Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein-coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review, we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations.
Computational Tools and Algorithms for Designing Customized Synthetic Genes

International Nuclear Information System (INIS)

Gould, Nathan; Hendy, Oliver; Papamichail, Dimitris

2014-01-01

Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein-coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review, we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations.
Novel methods for the molecular discrimination of Fasciola spp. on the basis of nuclear protein-coding genes.

Science.gov (United States)

Shoriki, Takuya; Ichikawa-Seki, Madoka; Suganuma, Keisuke; Naito, Ikunori; Hayashi, Kei; Nakao, Minoru; Aita, Junya; Mohanta, Uday Kumar; Inoue, Noboru; Murakami, Kenji; Itagaki, Tadashi

2016-06-01

Fasciolosis is an economically important disease of livestock caused by Fasciola hepatica, Fasciola gigantica, and aspermic Fasciola flukes. The aspermic Fasciola flukes have been discriminated morphologically from the two other species by the absence of sperm in their seminal vesicles. To date, the molecular discrimination of F. hepatica and F. gigantica has relied on the nucleotide sequences of the internal transcribed spacer 1 (ITS1) region. However, ITS1 genotypes of aspermic Fasciola flukes cannot be clearly differentiated from those of F. hepatica and F. gigantica. Therefore, more precise and robust methods are required to discriminate Fasciola spp. In this study, we developed PCR restriction fragment length polymorphism and multiplex PCR methods to discriminate F. hepatica, F. gigantica, and aspermic Fasciola flukes on the basis of the nuclear protein-coding genes, phosphoenolpyruvate carboxykinase and DNA polymerase delta, which are single locus genes in most eukaryotes. All aspermic Fasciola flukes used in this study had mixed fragment pattern of F. hepatica and F. gigantica for both of these genes, suggesting that the flukes are descended through hybridization between the two species. These molecular methods will facilitate the identification of F. hepatica, F. gigantica, and aspermic Fasciola flukes, and will also prove useful in etiological studies of fasciolosis. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Cavitation during the protein misfolding cyclic amplification (PMCA) method – The trigger for de novo prion generation?

International Nuclear Information System (INIS)

Haigh, Cathryn L.; Drew, Simon C.

2015-01-01

The protein misfolding cyclic amplification (PMCA) technique has become a widely-adopted method for amplifying minute amounts of the infectious conformer of the prion protein (PrP). PMCA involves repeated cycles of 20 kHz sonication and incubation, during which the infectious conformer seeds the conversion of normally folded protein by a templating interaction. Recently, it has proved possible to create an infectious PrP conformer without the need for an infectious seed, by including RNA and the phospholipid POPG as essential cofactors during PMCA. The mechanism underpinning this de novo prion formation remains unknown. In this study, we first establish by spin trapping methods that cavitation bubbles formed during PMCA provide a radical-rich environment. Using a substrate preparation comparable to that employed in studies of de novo prion formation, we demonstrate by immuno-spin trapping that PrP- and RNA-centered radicals are generated during sonication, in addition to PrP-RNA cross-links. We further show that serial PMCA produces protease-resistant PrP that is oxidatively modified. We suggest a unique confluence of structural (membrane-mimetic hydrophobic/hydrophilic bubble interface) and chemical (ROS) effects underlie the phenomenon of de novo prion formation by PMCA, and that these effects have meaningful biological counterparts of possible relevance to spontaneous prion formation in vivo. - Highlights: • Sonication during PMCA generates free radicals at the surface of cavitation bubbles. • PrP-centered and RNA-centered radicals are formed in addition to PrP-RNA adducts. • De novo prions may result from ROS and structural constraints during cavitation
Cavitation during the protein misfolding cyclic amplification (PMCA) method – The trigger for de novo prion generation?

Energy Technology Data Exchange (ETDEWEB)

Haigh, Cathryn L., E-mail: chaigh@unimelb.edu.au [Department of Pathology, The University of Melbourne, Victoria 3010 (Australia); Drew, Simon C., E-mail: sdrew@unimelb.edu.au [Florey Department of Neuroscience and Mental Health, The University of Melbourne, Victoria 3010 (Australia)

2015-06-05

The protein misfolding cyclic amplification (PMCA) technique has become a widely-adopted method for amplifying minute amounts of the infectious conformer of the prion protein (PrP). PMCA involves repeated cycles of 20 kHz sonication and incubation, during which the infectious conformer seeds the conversion of normally folded protein by a templating interaction. Recently, it has proved possible to create an infectious PrP conformer without the need for an infectious seed, by including RNA and the phospholipid POPG as essential cofactors during PMCA. The mechanism underpinning this de novo prion formation remains unknown. In this study, we first establish by spin trapping methods that cavitation bubbles formed during PMCA provide a radical-rich environment. Using a substrate preparation comparable to that employed in studies of de novo prion formation, we demonstrate by immuno-spin trapping that PrP- and RNA-centered radicals are generated during sonication, in addition to PrP-RNA cross-links. We further show that serial PMCA produces protease-resistant PrP that is oxidatively modified. We suggest a unique confluence of structural (membrane-mimetic hydrophobic/hydrophilic bubble interface) and chemical (ROS) effects underlie the phenomenon of de novo prion formation by PMCA, and that these effects have meaningful biological counterparts of possible relevance to spontaneous prion formation in vivo. - Highlights: • Sonication during PMCA generates free radicals at the surface of cavitation bubbles. • PrP-centered and RNA-centered radicals are formed in addition to PrP-RNA adducts. • De novo prions may result from ROS and structural constraints during cavitation.
De Novo Assembly and Characterization of the Transcriptome of Grasshopper Shirakiacris shirakii

Directory of Open Access Journals (Sweden)

Zhongying Qiu

2016-07-01

Full Text Available Background: The grasshopper Shirakiacris shirakii is an important agricultural pest and feeds mainly on gramineous plants, thereby causing economic damage to a wide range of crops. However, genomic information on this species is extremely limited thus far, and transcriptome data relevant to insecticide resistance and pest control are also not available. Methods: The transcriptome of S. shirakii was sequenced using the Illumina HiSeq platform, and we de novo assembled the transcriptome. Results: Its sequencing produced a total of 105,408,878 clean reads, and the de novo assembly revealed 74,657 unigenes with an average length of 680 bp and N50 of 1057 bp. A total of 28,173 unigenes were annotated for the NCBI non-redundant protein sequences (Nr, NCBI non-redundant nucleotide sequences (Nt, a manually-annotated and reviewed protein sequence database (Swiss-Prot, Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. Based on the Nr annotation results, we manually identified 79 unigenes encoding cytochrome P450 monooxygenases (P450s, 36 unigenes encoding carboxylesterases (CarEs and 36 unigenes encoding glutathione S-transferases (GSTs in S. shirakii. Core RNAi components relevant to miroRNA, siRNA and piRNA pathways, including Pasha, Loquacious, Argonaute-1, Argonaute-2, Argonaute-3, Zucchini, Aubergine, enhanced RNAi-1 and Piwi, were expressed in S. shirakii. We also identified five unigenes that were homologous to the Sid-1 gene. In addition, the analysis of differential gene expressions revealed that a total of 19,764 unigenes were up-regulated and 4185 unigenes were down-regulated in larvae. In total, we predicted 7504 simple sequence repeats (SSRs from 74,657 unigenes. Conclusions: The comprehensive de novo transcriptomic data of S. shirakii will offer a series of valuable molecular resources for better studying insecticide resistance, RNAi and molecular marker discovery in the transcriptome.
Synergistic interactions between Drosophila orthologues of genes spanned by de novo human CNVs support multiple-hit models of autism.

Science.gov (United States)

Grice, Stuart J; Liu, Ji-Long; Webber, Caleb

2015-03-01

Autism spectrum disorders (ASDs) are highly heritable and characterised by deficits in social interaction and communication, as well as restricted and repetitive behaviours. Although a number of highly penetrant ASD gene variants have been identified, there is growing evidence to support a causal role for combinatorial effects arising from the contributions of multiple loci. By examining synaptic and circadian neurological phenotypes resulting from the dosage variants of unique human:fly orthologues in Drosophila, we observe numerous synergistic interactions between pairs of informatically-identified candidate genes whose orthologues are jointly affected by large de novo copy number variants (CNVs). These CNVs were found in the genomes of individuals with autism, including a patient carrying a 22q11.2 deletion. We first demonstrate that dosage alterations of the unique Drosophila orthologues of candidate genes from de novo CNVs that harbour only a single candidate gene display neurological defects similar to those previously reported in Drosophila models of ASD-associated variants. We then considered pairwise dosage changes within the set of orthologues of candidate genes that were affected by the same single human de novo CNV. For three of four CNVs with complete orthologous relationships, we observed significant synergistic effects following the simultaneous dosage change of gene pairs drawn from a single CNV. The phenotypic variation observed at the Drosophila synapse that results from these interacting genetic variants supports a concordant phenotypic outcome across all interacting gene pairs following the direction of human gene copy number change. We observe both specificity and transitivity between interactors, both within and between CNV candidate gene sets, supporting shared and distinct genetic aetiologies. We then show that different interactions affect divergent synaptic processes, demonstrating distinct molecular aetiologies. Our study illustrates
Td4IN2: A drought-responsive durum wheat (Triticum durum Desf.) gene coding for a resistance like protein with serine/threonine protein kinase, nucleotide binding site and leucine rich domains.

Science.gov (United States)

Rampino, Patrizia; De Pascali, Mariarosaria; De Caroli, Monica; Luvisi, Andrea; De Bellis, Luigi; Piro, Gabriella; Perrotta, Carla

2017-11-01

Wheat, the main food source for a third of world population, appears strongly under threat because of predicted increasing temperatures coupled to drought. Plant complex molecular response to drought stress relies on the gene network controlling cell reactions to abiotic stress. In the natural environment, plants are subjected to the combination of abiotic and biotic stresses. Also the response of plants to biotic stress, to cope with pathogens, involves the activation of a molecular network. Investigations on combination of abiotic and biotic stresses indicate the existence of cross-talk between the two networks and a kind of overlapping can be hypothesized. In this work we describe the isolation and characterization of a drought-related durum wheat (Triticum durum Desf.) gene, identified in a previous study, coding for a protein combining features of NBS-LRR type resistance protein with a S/TPK domain, involved in drought stress response. This is one of the few examples reported where all three domains are present in a single protein and, to our knowledge, it is the first report on a gene specifically induced by drought stress and drought-related conditions, with this particular structure. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
An operon from Lactobacillus helveticus composed of a proline iminopeptidase gene (pepI) and two genes coding for putative members of the ABC transporter family of proteins.

Science.gov (United States)

Varmanen, P; Rantanen, T; Palva, A

1996-12-01

A proline iminopeptidase gene (pepI) of an industrial Lactobacillus helveticus strain was cloned and found to be organized in an operon-like structure of three open reading frames (ORF1, ORF2 and ORF3). ORF1 was preceded by a typical prokaryotic promoter region, and a putative transcription terminator was found downstream of ORF3, identified as the pepI gene. Using primer-extension analyses, only one transcription start site, upstream of ORF1, was identifiable in the predicted operon. Although the size of mRNA could not be judged by Northern analysis either with ORF1-, ORF2- or pepI-specific probes, reverse transcription-PCR analyses further supported the operon structure of the three genes. ORF1, ORF2 and ORF3 had coding capacities for 50.7, 24.5 and 33.8 kDa proteins, respectively. The ORF3-encoded PepI protein showed 65% identity with the PepI proteins from Lactobacillus delbrueckii subsp. bulgaricus and Lactobacillus delbrueckii subsp. lactis. The ORF1-encoded protein had significant homology with several members of the ABC transporter family but, with two distinct putative ATP-binding sites, it would represent an unusual type among the bacterial ABC transporters. ORF2 encoded a putative integral membrane protein also characteristic of the ABC transporter family. The pepI gene was overexpressed in Escherichia coli. Purified PepI hydrolysed only di and tripeptides with proline in the first position. Optimum PepI activity was observed at pH 7.5 and 40 degrees C. A gel filtration analysis indicated that PepI is a dimer of M(r) 53,000. PepI was shown to be a metal-independent serine peptidase having thiol groups at or near the active site. Kinetic studies with proline-p-nitroanilide as substrate revealed Km and Vmax values of 0.8 mM and 350 mmol min-1 mg-1, respectively, and a very high turnover number of 135,000 s-1.
Identification of a Novel De Novo Heterozygous Deletion in the SOX10 Gene in Waardenburg Syndrome Type II Using Next-Generation Sequencing.

Science.gov (United States)

Li, Haonan; Jin, Peng; Hao, Qian; Zhu, Wei; Chen, Xia; Wang, Ping

2017-11-01

Waardenburg syndrome (WS) is a rare autosomal dominant disorder associated with pigmentation abnormalities and sensorineural hearing loss. In this study, we investigated the genetic cause of WSII in a patient and evaluated the reliability of the targeted next-generation exome sequencing method for the genetic diagnosis of WS. Clinical evaluations were conducted on the patient and targeted next-generation sequencing (NGS) was used to identify the candidate genes responsible for WSII. Multiplex ligation-dependent probe amplification (MLPA) and real-time quantitative polymerase chain reaction (qPCR) were performed to confirm the targeted NGS results. Targeted NGS detected the entire deletion of the coding sequence (CDS) of the SOX10 gene in the WSII patient. MLPA results indicated that all exons of the SOX10 heterozygous deletion were detected; no aberrant copy number in the PAX3 and microphthalmia-associated transcription factor (MITF) genes was found. Real-time qPCR results identified the mutation as a de novo heterozygous deletion. This is the first report of using a targeted NGS method for WS candidate gene sequencing; its accuracy was verified by using the MLPA and qPCR methods. Our research provides a valuable method for the genetic diagnosis of WS.
Synaptic, transcriptional and chromatin genes disrupted in autism.

Science.gov (United States)

De Rubeis, Silvia; He, Xin; Goldberg, Arthur P; Poultney, Christopher S; Samocha, Kaitlin; Cicek, A Erucment; Kou, Yan; Liu, Li; Fromer, Menachem; Walker, Susan; Singh, Tarinder; Klei, Lambertus; Kosmicki, Jack; Shih-Chen, Fu; Aleksic, Branko; Biscaldi, Monica; Bolton, Patrick F; Brownfeld, Jessica M; Cai, Jinlu; Campbell, Nicholas G; Carracedo, Angel; Chahrour, Maria H; Chiocchetti, Andreas G; Coon, Hilary; Crawford, Emily L; Curran, Sarah R; Dawson, Geraldine; Duketis, Eftichia; Fernandez, Bridget A; Gallagher, Louise; Geller, Evan; Guter, Stephen J; Hill, R Sean; Ionita-Laza, Juliana; Jimenz Gonzalez, Patricia; Kilpinen, Helena; Klauck, Sabine M; Kolevzon, Alexander; Lee, Irene; Lei, Irene; Lei, Jing; Lehtimäki, Terho; Lin, Chiao-Feng; Ma'ayan, Avi; Marshall, Christian R; McInnes, Alison L; Neale, Benjamin; Owen, Michael J; Ozaki, Noriio; Parellada, Mara; Parr, Jeremy R; Purcell, Shaun; Puura, Kaija; Rajagopalan, Deepthi; Rehnström, Karola; Reichenberg, Abraham; Sabo, Aniko; Sachse, Michael; Sanders, Stephan J; Schafer, Chad; Schulte-Rüther, Martin; Skuse, David; Stevens, Christine; Szatmari, Peter; Tammimies, Kristiina; Valladares, Otto; Voran, Annette; Li-San, Wang; Weiss, Lauren A; Willsey, A Jeremy; Yu, Timothy W; Yuen, Ryan K C; Cook, Edwin H; Freitag, Christine M; Gill, Michael; Hultman, Christina M; Lehner, Thomas; Palotie, Aaarno; Schellenberg, Gerard D; Sklar, Pamela; State, Matthew W; Sutcliffe, James S; Walsh, Christiopher A; Scherer, Stephen W; Zwick, Michael E; Barett, Jeffrey C; Cutler, David J; Roeder, Kathryn; Devlin, Bernie; Daly, Mark J; Buxbaum, Joseph D

2014-11-13

The genetic architecture of autism spectrum disorder involves the interplay of common and rare variants and their impact on hundreds of genes. Using exome sequencing, here we show that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, plus a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic formation, transcriptional regulation and chromatin-remodelling pathways. These include voltage-gated ion channels regulating the propagation of action potentials, pacemaking and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodellers-most prominently those that mediate post-translational lysine methylation/demethylation modifications of histones.
Study on Fusion Protein and Its gene in Baculovirus Specificity

International Nuclear Information System (INIS)

Nemr, W.A.H.

2012-01-01

Baculoviruses are subdivided into two groups depending on the type of budded virus envelop fusion protein; group I utilized gp64 which include the most of nucleopolyhedroviruses (NPVs), group II utilized F protein which include the remnants of NPVs and all Granuloviruses (GVs). Recent studies reported the viral F protein coding gene as a host cellular sourced gene and may evolutionary acquired from the host genome referring to phylogeny analysis of fusion proteins. Thus, it was deduced that F protein coding gene is species- specific nucleotide sequence related to the type of the specific host and if virus could infect an unexpected host, the resulted virus may encode a vary F gene. In this regard, the present study utilized the mentioned properties of F gene in an attempt to produce a model of specific and more economic wider range granulovirus bio- pesticide able to infect both Spodoptera littoralis and Phthorimaea operculella larvae. Multiple sequence alignment and phylogeny analysis were performed on six members of group II baculovirus, novel universal PCR primers were manually designed from the conserved regions in the alignment graph, targeted to amplify species- specific sequence entire F gene open reading frame (ORF) which is useful in molecular identification of baculovirus in unknown samples. So, the PCR product of SpliGV used to prepare a specific probe for the F gene of this type of virus. Results reflected that it is possible to infect S. littoralis larvae by PhopGV if injected into larval haemocoel, the resulted virus of this infection showed by using DNA hybridization technique to be encode to F gene homologous with the F gene of Spli GV, which is revealed that the resulted virus acquired this F gene sequence from the host genome after infection. Consequently, these results may infer that if genetic aberrations occur in the host genome, this may affect in baculoviral infectivity. So, this study aimed to investigate the effect of gamma radiation at

Proteomic Profiling of De Novo Protein Synthesis in Starvation-Induced Autophagy Using Bioorthogonal Noncanonical Amino Acid Tagging.

Science.gov (United States)

Zhang, J; Wang, J; Lee, Y-M; Lim, T-K; Lin, Q; Shen, H-M

2017-01-01

Autophagy is an intracellular degradation process activated by stress factors such as nutrient starvation to maintain cellular homeostasis. There is emerging evidence demonstrating that de novo protein synthesis is involved in the autophagic process. However, up-to-date characterizing of these de novo proteins is technically difficult. In this chapter, we describe a novel method to identify newly synthesized proteins during starvation-mediated autophagy by bioorthogonal noncanonical amino acid tagging (BONCAT), in conjunction with isobaric tagging for relative and absolute quantification (iTRAQ)-based quantitative proteomics. l-azidohomoalanine (AHA) is an analog of methionine, and it can be readily incorporated into the newly synthesized proteins. The AHA-containing proteins can be enriched with avidin beads after a "click" reaction between alkyne-bearing biotin and the azide moiety of AHA. The enriched proteins are then subjected to iTRAQ™ labeling for protein identification and quantification using liquid chromatography-tandem mass spectrometry (LC-MS/MS). By using this technique, we have successfully profiled more than 700 proteins that are synthesized during starvation-induced autophagy. We believe that this approach is effective in identification of newly synthesized proteins in the process of autophagy and provides useful insights to the molecular mechanisms and biological functions of autophagy. © 2017 Elsevier Inc. All rights reserved.
Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

Directory of Open Access Journals (Sweden)

Juan Ning

Full Text Available BACKGROUND: Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. RESULTS: Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs that provide a resource for gene function studies. CONCLUSION: Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.
Death of a dogma: eukaryotic mRNAs can code for more than one protein.

Science.gov (United States)

Mouilleron, Hélène; Delcourt, Vivian; Roucou, Xavier

2016-01-08

mRNAs carry the genetic information that is translated by ribosomes. The traditional view of a mature eukaryotic mRNA is a molecule with three main regions, the 5' UTR, the protein coding open reading frame (ORF) or coding sequence (CDS), and the 3' UTR. This concept assumes that ribosomes translate one ORF only, generally the longest one, and produce one protein. As a result, in the early days of genomics and bioinformatics, one CDS was associated with each protein-coding gene. This fundamental concept of a single CDS is being challenged by increasing experimental evidence indicating that annotated proteins are not the only proteins translated from mRNAs. In particular, mass spectrometry (MS)-based proteomics and ribosome profiling have detected productive translation of alternative open reading frames. In several cases, the alternative and annotated proteins interact. Thus, the expression of two or more proteins translated from the same mRNA may offer a mechanism to ensure the co-expression of proteins which have functional interactions. Translational mechanisms already described in eukaryotic cells indicate that the cellular machinery is able to translate different CDSs from a single viral or cellular mRNA. In addition to summarizing data showing that the protein coding potential of eukaryotic mRNAs has been underestimated, this review aims to challenge the single translated CDS dogma. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy

Science.gov (United States)

Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.

2015-01-01

Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID
Characterisation of silent and active genes for a variable large protein of Borrelia recurrentis

Directory of Open Access Journals (Sweden)

Scragg Ian G

2002-10-01

Full Text Available Abstract Background We report the characterisation of the variable large protein (vlp gene expressed by clinical isolate A1 of Borrelia recurrentis; the agent of the life-threatening disease louse-borne relapsing fever. Methods The major vlp protein of this isolate was characterised and a DNA probe created. Use of this together with standard molecular methods was used to determine the location of the vlp1B. recurrentis A1 gene in both this and other isolates. Results This isolate was found to carry silent and expressed copies of the vlp1B. recurrentis A1 gene on plasmids of 54 kbp and 24 kbp respectively, whereas a different isolate, A17, had only the silent vlp1B. recurrentis A17 on a 54 kbp plasmid. Silent and expressed vlp1 have identical mature protein coding regions but have different 5' regions, both containing different potential lipoprotein leader sequences. Only one form of vlp1 is transcribed in the A1 isolate of B. recurrentis, yet both 5' upstream sequences of this vlp1 gene possess features of bacterial promoters. Conclusion Taken together these results suggest that antigenic variation in B. recurrentis may result from recombination of variable large and small protein genes at the junction between lipoprotein leader sequence and mature protein coding region. However, this hypothetical model needs to be validated by further identification of expressed and silent variant protein genes in other B. recurrentis isolates.
De Novo Assembly and Characterization of Sophora japonica Transcriptome Using RNA-seq

Directory of Open Access Journals (Sweden)

Liucun Zhu

2014-01-01

Full Text Available Sophora japonica Linn (Chinese Scholar Tree is a shrub species belonging to the subfamily Faboideae of the pea family Fabaceae. In this study, RNA sequencing of S. japonica transcriptome was performed to produce large expression datasets for functional genomic analysis. Approximate 86.1 million high-quality clean reads were generated and assembled de novo into 143010 unique transcripts and 57614 unigenes. The average length of unigenes was 901 bps with an N50 of 545 bps. Four public databases, including the NCBI nonredundant protein (NR, Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG, and the Cluster of Orthologous Groups (COG, were used to annotate unigenes through NCBI BLAST procedure. A total of 27541 of 57614 unigenes (47.8% were annotated for gene descriptions, conserved protein domains, or gene ontology. Moreover, an interaction network of unigenes in S. japonica was predicted based on known protein-protein interactions of putative orthologs of well-studied plant genomes. The transcriptome data of S. japonica reported here represents first genome-scale investigation of gene expressions in Faboideae plants. We expect that our study will provide a useful resource for further studies on gene expression, genomics, functional genomics, and protein-protein interaction in S. japonica.
Selection on Coding and Regulatory Variation Maintains Individuality in Major Urinary Protein Scent Marks in Wild Mice.

Directory of Open Access Journals (Sweden)

Michael J Sheehan

2016-03-01

Full Text Available Recognition of individuals by scent is widespread across animal taxa. Though animals can often discriminate chemical blends based on many compounds, recent work shows that specific protein pheromones are necessary and sufficient for individual recognition via scent marks in mice. The genetic nature of individuality in scent marks (e.g. coding versus regulatory variation and the evolutionary processes that maintain diversity are poorly understood. The individual signatures in scent marks of house mice are the protein products of a group of highly similar paralogs in the major urinary protein (Mup gene family. Using the offspring of wild-caught mice, we examine individuality in the major urinary protein (MUP scent marks at the DNA, RNA and protein levels. We show that individuality arises through a combination of variation at amino acid coding sites and differential transcription of central Mup genes across individuals, and we identify eSNPs in promoters. There is no evidence of post-transcriptional processes influencing phenotypic diversity as transcripts accurately predict the relative abundance of proteins in urine samples. The match between transcripts and urine samples taken six months earlier also emphasizes that the proportional relationships across central MUP isoforms in urine is stable. Balancing selection maintains coding variants at moderate frequencies, though pheromone diversity appears limited by interactions with vomeronasal receptors. We find that differential transcription of the central Mup paralogs within and between individuals significantly increases the individuality of pheromone blends. Balancing selection on gene regulation allows for increased individuality via combinatorial diversity in a limited number of pheromones.
Influence of Coding Variability in APP-Aβ Metabolism Genes in Sporadic Alzheimer's Disease.

Directory of Open Access Journals (Sweden)

Celeste Sassi

Full Text Available The cerebral deposition of Aβ42, a neurotoxic proteolytic derivate of amyloid precursor protein (APP, is a central event in Alzheimer's disease (AD(Amyloid hypothesis. Given the key role of APP-Aβ metabolism in AD pathogenesis, we selected 29 genes involved in APP processing, Aβ degradation and clearance. We then used exome and genome sequencing to investigate the single independent (single-variant association test and cumulative (gene-based association test effect of coding variants in these genes as potential susceptibility factors for AD, in a cohort composed of 332 sporadic and mainly late-onset AD cases and 676 elderly controls from North America and the UK. Our study shows that common coding variability in these genes does not play a major role for the disease development. In the single-variant association analysis, the main hits, none of which statistically significant after multiple testing correction (1.9e-4coding variants (0.009%genes mainly involved in Aβ extracellular degradation (TTR, ACE, clearance (LRP1 and APP trafficking and recycling (SORL1. These results were partially replicated in the gene-based analysis (c-alpha and SKAT tests, that reports ECE1, LYZ and TTR as nominally associated to AD (1.7e-3 coding variability in APP-Aβ genes is not a critical factor for AD development and 2 Aβ degradation and clearance, rather than Aβ production, may play a key role in the etiology of sporadic AD.
Genome-Wide Identification and Analysis of Genes Encoding PHD-Finger Protein in Tomato

International Nuclear Information System (INIS)

Hayat, S.; Cheng, Z.; Chen, X.

2016-01-01

The PHD-finger proteins are conserved in eukaryotic organisms and are involved in a variety of important functions in different biological processes in plants. However, the function of PHD fingers are poorly known in tomato (Solanum lycopersicum L.). In current study, we identified 45 putative genes coding Phd finger protein in tomato distributed on 11 chromosomes except for chromosome 8. Some of the genes encode other conserved key domains besides Phd-finger. Phylogenetic analysis of these 45 proteins resulted in seven clusters. Most Phd finger proteins were predicted to PML body location. These PHD-finger genes displayed differential expression either in various organs, at different development stages and under stresses in tomato. Our study provides the first systematic analysis of PHD-finger genes and proteins in tomato. This preliminary study provides a very useful reference information for Phd-finger proteins in tomato. They will be helpful for cloning and functional study of tomato PHD-finger genes. (author)
A compendium of transcription factor and Transcriptionally active protein coding gene families in cowpea (Vigna unguiculata L.).

Science.gov (United States)

Misra, Vikram A; Wang, Yu; Timko, Michael P

2017-11-22

Cowpea (Vigna unguiculata (L.) Walp.) is the most important food and forage legume in the semi-arid tropics of sub-Saharan Africa where approximately 80% of worldwide production takes place primarily on low-input, subsistence farm sites. Among the major goals of cowpea breeding and improvement programs are the rapid manipulation of agronomic traits for seed size and quality and improved resistance to abiotic and biotic stresses to enhance productivity. Knowing the suite of transcription factors (TFs) and transcriptionally active proteins (TAPs) that control various critical plant cellular processes would contribute tremendously to these improvement aims. We used a computational approach that employed three different predictive pipelines to data mine the cowpea genome and identified over 4400 genes representing 136 different TF and TAP families. We compare the information content of cowpea to two evolutionarily close species common bean (Phaseolus vulgaris), and soybean (Glycine max) to gauge the relative informational content. Our data indicate that correcting for genome size cowpea has fewer TF and TAP genes than common bean (4408 / 5291) and soybean (4408/ 11,065). Members of the GROWTH-REGULATING FACTOR (GRF) and Auxin/indole-3-acetic acid (Aux/IAA) gene families appear to be over-represented in the genome relative to common bean and soybean, whereas members of the MADS (Minichromosome maintenance deficient 1 (MCM1), AGAMOUS, DEFICIENS, and serum response factor (SRF)) and C2C2-YABBY appear to be under-represented. Analysis of the AP2-EREBP APETALA2-Ethylene Responsive Element Binding Protein (AP2-EREBP), NAC (NAM (no apical meristem), ATAF1, 2 (Arabidopsis transcription activation factor), CUC (cup-shaped cotyledon)), and WRKY families, known to be important in defense signaling, revealed changes and phylogenetic rearrangements relative to common bean and soybean that suggest these groups may have evolved different functions. The availability of detailed
Interdependence, Reflexivity, Fidelity, Impedance Matching, and the Evolution of Genetic Coding

Science.gov (United States)

Carter, Charles W; Wills, Peter R

2018-01-01

Abstract Genetic coding is generally thought to have required ribozymes whose functions were taken over by polypeptide aminoacyl-tRNA synthetases (aaRS). Two discoveries about aaRS and their interactions with tRNA substrates now furnish a unifying rationale for the opposite conclusion: that the key processes of the Central Dogma of molecular biology emerged simultaneously and naturally from simple origins in a peptide•RNA partnership, eliminating the epistemological utility of a prior RNA world. First, the two aaRS classes likely arose from opposite strands of the same ancestral gene, implying a simple genetic alphabet. The resulting inversion symmetries in aaRS structural biology would have stabilized the initial and subsequent differentiation of coding specificities, rapidly promoting diversity in the proteome. Second, amino acid physical chemistry maps onto tRNA identity elements, establishing reflexive, nanoenvironmental sensing in protein aaRS. Bootstrapping of increasingly detailed coding is thus intrinsic to polypeptide aaRS, but impossible in an RNA world. These notions underline the following concepts that contradict gradual replacement of ribozymal aaRS by polypeptide aaRS: 1) aaRS enzymes must be interdependent; 2) reflexivity intrinsic to polypeptide aaRS production dynamics promotes bootstrapping; 3) takeover of RNA-catalyzed aminoacylation by enzymes will necessarily degrade specificity; and 4) the Central Dogma’s emergence is most probable when replication and translation error rates remain comparable. These characteristics are necessary and sufficient for the essentially de novo emergence of a coupled gene–replicase–translatase system of genetic coding that would have continuously preserved the functional meaning of genetically encoded protein genes whose phylogenetic relationships match those observed today. PMID:29077934
Arabidopsis RNASE THREE LIKE2 Modulates the Expression of Protein-Coding Genes via 24-Nucleotide Small Interfering RNA-Directed DNA Methylation.

Science.gov (United States)

Elvira-Matelot, Emilie; Hachet, Mélanie; Shamandi, Nahid; Comella, Pascale; Sáez-Vásquez, Julio; Zytnicki, Matthias; Vaucheret, Hervé

2016-02-01

RNaseIII enzymes catalyze the cleavage of double-stranded RNA (dsRNA) and have diverse functions in RNA maturation. Arabidopsis thaliana RNASE THREE LIKE2 (RTL2), which carries one RNaseIII and two dsRNA binding (DRB) domains, is a unique Arabidopsis RNaseIII enzyme resembling the budding yeast small interfering RNA (siRNA)-producing Dcr1 enzyme. Here, we show that RTL2 modulates the production of a subset of small RNAs and that this activity depends on both its RNaseIII and DRB domains. However, the mode of action of RTL2 differs from that of Dcr1. Whereas Dcr1 directly cleaves dsRNAs into 23-nucleotide siRNAs, RTL2 likely cleaves dsRNAs into longer molecules, which are subsequently processed into small RNAs by the DICER-LIKE enzymes. Depending on the dsRNA considered, RTL2-mediated maturation either improves (RTL2-dependent loci) or reduces (RTL2-sensitive loci) the production of small RNAs. Because the vast majority of RTL2-regulated loci correspond to transposons and intergenic regions producing 24-nucleotide siRNAs that guide DNA methylation, RTL2 depletion modifies DNA methylation in these regions. Nevertheless, 13% of RTL2-regulated loci correspond to protein-coding genes. We show that changes in 24-nucleotide siRNA levels also affect DNA methylation levels at such loci and inversely correlate with mRNA steady state levels, thus implicating RTL2 in the regulation of protein-coding gene expression. © 2016 American Society of Plant Biologists. All rights reserved.
The complete mitochondrial genome of the land snail Cornu aspersum (Helicidae: Mollusca: intra-specific divergence of protein-coding genes and phylogenetic considerations within Euthyneura.

Directory of Open Access Journals (Sweden)

Juan Diego Gaitán-Espitia

Full Text Available The complete sequences of three mitochondrial genomes from the land snail Cornu aspersum were determined. The mitogenome has a length of 14050 bp, and it encodes 13 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes. It also includes nine small intergene spacers, and a large AT-rich intergenic spacer. The intra-specific divergence analysis revealed that COX1 has the lower genetic differentiation, while the most divergent genes were NADH1, NADH3 and NADH4. With the exception of Euhadra herklotsi, the structural comparisons showed the same gene order within the family Helicidae, and nearly identical gene organization to that found in order Pulmonata. Phylogenetic reconstruction recovered Basommatophora as polyphyletic group, whereas Eupulmonata and Pulmonata as paraphyletic groups. Bayesian and Maximum Likelihood analyses showed that C. aspersum is a close relative of Cepaea nemoralis, and with the other Helicidae species form a sister group of Albinaria caerulea, supporting the monophyly of the Stylommatophora clade.
A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

Science.gov (United States)

Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

2012-01-01

Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

Directory of Open Access Journals (Sweden)

Ai-bing Zhang

Full Text Available Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish and two representing non-coding ITS barcodes (rust fungi and brown algae. Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ and Maximum likelihood (ML methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40% for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37% for 1094 brown algae queries, both using ITS barcodes.
The Ever-Evolving Concept of the Gene: The Use of RNA/Protein Experimental Techniques to Understand Genome Functions

Directory of Open Access Journals (Sweden)

Andrea Cipriano

2018-03-01

Full Text Available The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs, which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years.
De novo assembly of Eugenia uniflora L. transcriptome and identification of genes from the terpenoid biosynthesis pathway.

Science.gov (United States)

Guzman, Frank; Kulcheski, Franceli Rodrigues; Turchetto-Zolet, Andreia Carina; Margis, Rogerio

2014-12-01

Pitanga (Eugenia uniflora L.) is a member of the Myrtaceae family and is of particular interest due to its medicinal properties that are attributed to specialized metabolites with known biological activities. Among these molecules, terpenoids are the most abundant in essential oils that are found in the leaves and represent compounds with potential pharmacological benefits. The terpene diversity observed in Myrtaceae is determined by the activity of different members of the terpene synthase and oxidosqualene cyclase families. Therefore, the aim of this study was to perform a de novo assembly of transcripts from E. uniflora leaves and to annotation to identify the genes potentially involved in the terpenoid biosynthesis pathway and terpene diversity. In total, 72,742 unigenes with a mean length of 1048bp were identified. Of these, 43,631 and 36,289 were annotated with the NCBI non-redundant protein and Swiss-Prot databases, respectively. The gene ontology categorized the sequences into 53 functional groups. A metabolic pathway analysis with KEGG revealed 8,625 unigenes assigned to 141 metabolic pathways and 40 unigenes predicted to be associated with the biosynthesis of terpenoids. Furthermore, we identified four putative full-length terpene synthase genes involved in sesquiterpenes and monoterpenes biosynthesis, and three putative full-length oxidosqualene cyclase genes involved in the triterpenes biosynthesis. The expression of these genes was validated in different E. uniflora tissues. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

DEFF Research Database (Denmark)

Poulsen, Peter; Jensen, Kaj Frank; Valentin-Hansen, Poul

1983-01-01

leader segment in front of the protein-coding region. This leader contains a structure with features characteristic for a (translated?) rho-independent transcriptional terminator, which is preceded by a cluster of uridylate residues. This indicates that the frequency of pyrE transcription is regulated......Orotate phosphoribosyltransferase (EC 2.4.2.10) was purified to electrophoretic homogeneity from a strain of Escherichia coli containing the pyrE gene cloned on a multicopy plasmid. The relative molecular masses (Mr) of the native enzyme and its subunit were estimated by means of gel filtration...
Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

Science.gov (United States)

Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

2015-01-01

In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.
RNA editing differently affects protein-coding genes in D. melanogaster and H. sapiens.

Science.gov (United States)

Grassi, Luigi; Leoni, Guido; Tramontano, Anna

2015-07-14

When an RNA editing event occurs within a coding sequence it can lead to a different encoded amino acid. The biological significance of these events remains an open question: they can modulate protein functionality, increase the complexity of transcriptomes or arise from a loose specificity of the involved enzymes. We analysed the editing events in coding regions that produce or not a change in the encoded amino acid (nonsynonymous and synonymous events, respectively) in D. melanogaster and in H. sapiens and compared them with the appropriate random models. Interestingly, our results show that the phenomenon has rather different characteristics in the two organisms. For example, we confirm the observation that editing events occur more frequently in non-coding than in coding regions, and report that this effect is much more evident in H. sapiens. Additionally, in this latter organism, editing events tend to affect less conserved residues. The less frequently occurring editing events in Drosophila tend to avoid drastic amino acid changes. Interestingly, we find that, in Drosophila, changes from less frequently used codons to more frequently used ones are favoured, while this is not the case in H. sapiens.

The Rickettsia Endosymbiont of Ixodes pacificus Contains All the Genes of De Novo Folate Biosynthesis

Science.gov (United States)

Bodnar, James; Mortazavi, Bobak; Laurent, Timothy; Deason, Jeff; Thephavongsa, Khanhkeo; Zhong, Jianmin

2015-01-01

Ticks and other arthropods often are hosts to nutrient providing bacterial endosymbionts, which contribute to their host’s fitness by supplying nutrients such as vitamins and amino acids. It has been detected, in our lab, that Ixodes pacificus is host to Rickettsia species phylotype G021. This endosymbiont is predominantly present, and 100% maternally transmitted in I. pacificus. To study roles of phylotype G021 in I. pacificus, bioinformatic and molecular approaches were carried out. MUMmer genome alignments of whole genome sequence of I. scapularis, a close relative to I. pacificus, against completely sequenced genomes of R. bellii OSU85-389, R. conorii, and R. felis, identified 8,190 unique sequences that are homologous to Rickettsia sequences in the NCBI Trace Archive. MetaCyc metabolic reconstructions revealed that all folate gene orthologues (folA, folC, folE, folKP, ptpS) required for de novo folate biosynthesis are present in the genome of Rickettsia buchneri in I. scapularis. To examine the metabolic capability of phylotype G021 in I. pacificus, genes of the folate biosynthesis pathway of the bacterium were PCR amplified using degenerate primers. BLAST searches identified that nucleotide sequences of the folA, folC, folE, folKP, and ptpS genes possess 98.6%, 98.8%, 98.9%, 98.5% and 99.0% identity respectively to the corresponding genes of Rickettsia buchneri. Phylogenetic tree constructions show that the folate genes of phylotype G021 and homologous genes from various Rickettsia species are monophyletic. This study has shown that all folate genes exist in the genome of Rickettsia species phylotype G021 and that this bacterium has the genetic capability for de novo folate synthesis. PMID:26650541
De novo assembly and annotation of the Antarctic copepod (Tigriopus kingsejongensis) transcriptome.

Science.gov (United States)

Kim, Hui-Su; Lee, Bo-Young; Han, Jeonghoon; Lee, Young Hwan; Min, Gi-Sik; Kim, Sanghee; Lee, Jae-Seong

2016-08-01

The whole transcriptome of the Antarctic copepod (Tigriopus kingsejongensis) was sequenced using Illumina RNA-seq. De novo assembly was performed with 64,785,098 raw reads using Trinity, which assembled into 81,653 contigs. TransDecoder found 38,250 candidate coding contigs which showed homology to other species by BLAST analysis. Functional gene annotation was performed by Gene Ontology (GO), InterProScan, and KEGG pathway analyses. Finally, we identified a number of expressed gene catalog for T. kingsejongensis that is a useful model animal for gene information-based polar research to uncover molecular mechanisms of environmental adaptation on harsh environments. In particular, we observed highly developing lipid metabolism in T. kingsejongensis directly compared to those of the Far East Pacific coast copepod Tigriopus japonicus at the transcriptome level. Copyright © 2016 Elsevier B.V. All rights reserved.
Cloning and expression of gene encoding P23 protein from Cryptosporidium parvum

Directory of Open Access Journals (Sweden)

Dinh Thi Bich Lan

2014-12-01

Full Text Available We cloned the cp23 gene coding P23 (glycoprotein from Cryptosporidium parvum isolated from Thua Thien Hue province, Vietnam. The coding region of cp23 gene from C. parvum is 99% similar with cp23 gene deposited in NCBI (accession number: U34390. SDS-PAGE and Western blot analysis showed that the cp23 gene in E. coli BL21 StarTM (DE3 produced polypeptides with molecular weights of approximately 37, 40 and 49 kDa. These molecules may be non-glycosylated or glycosylated P23 fusion polypeptides. Recombinant P23 protein purified by GST (glutathione S-transferase affinity chromatography can be used as an antigen for C. parvum antibody production as well as to develop diagnostic kit for C. parvum.
De novo generation of infectious prions with bacterially expressed recombinant prion protein.

Science.gov (United States)

Zhang, Zhihong; Zhang, Yi; Wang, Fei; Wang, Xinhe; Xu, Yuanyuan; Yang, Huaiyi; Yu, Guohua; Yuan, Chonggang; Ma, Jiyan

2013-12-01

The prion hypothesis is strongly supported by the fact that prion infectivity and the pathogenic conformer of prion protein (PrP) are simultaneously propagated in vitro by the serial protein misfolding cyclic amplification (sPMCA). However, due to sPMCA's enormous amplification power, whether an infectious prion can be formed de novo with bacterially expressed recombinant PrP (rPrP) remains to be satisfactorily resolved. To address this question, we performed unseeded sPMCA with rPrP in a laboratory that has never been exposed to any native prions. Two types of proteinase K (PK)-resistant and self-perpetuating recombinant PrP conformers (rPrP-res) with PK-resistant cores of 17 or 14 kDa were generated. A bioassay revealed that rPrP-res(17kDa) was highly infectious, causing prion disease in wild-type mice with an average survival time of about 172 d. In contrast, rPrP-res(14kDa) completely failed to induce any disease. Our findings reveal that sPMCA is sufficient to initiate various self-perpetuating PK-resistant rPrP conformers, but not all of them possess in vivo infectivity. Moreover, generating an infectious prion in a prion-free environment establishes that an infectious prion can be formed de novo with bacterially expressed rPrP.
Computational Tools and Algorithms for Designing Customized Synthetic Genes

Directory of Open Access Journals (Sweden)

Nathan eGould

2014-10-01

Full Text Available Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de-novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations.
Molecular evolution of the Paramyxoviridae and Rhabdoviridae multiple-protein-encoding P gene.

Science.gov (United States)

Jordan, I K; Sutter, B A; McClure, M A

2000-01-01

Presented here is an analysis of the molecular evolutionary dynamics of the P gene among 76 representative sequences of the Paramyxoviridae and Rhabdoviridae RNA virus families. In a number of Paramyxoviridae taxa, as well as in vesicular stomatitis viruses of the Rhabdoviridae, the P gene encodes multiple proteins from a single genomic RNA sequence. These products include the phosphoprotein (P), as well as the C and V proteins. The complexity of the P gene makes it an intriguing locus to study from an evolutionary perspective. Amino acid sequence alignments of the proteins encoded at the P and N loci were used in independent phylogenetic reconstructions of the Paramyxoviridae and Rhabdoviridae families. P-gene-coding capacities were mapped onto the Paramyxoviridae phylogeny, and the most parsimonious path of multiple-coding-capacity evolution was determined. Levels of amino acid variation for Paramyxoviridae and Rhabdoviridae P-gene-encoded products were also analyzed. Proteins encoded in overlapping reading frames from the same nucleotides have different levels of amino acid variation. The nucleotide architecture that underlies the amino acid variation was determined in order to evaluate the role of selection in the evolution of the P gene overlapping reading frames. In every case, the evolution of one of the proteins encoded in the overlapping reading frames has been constrained by negative selection while the other has evolved more rapidly. The integrity of the overlapping reading frame that represents a derived state is generally maintained at the expense of the ancestral reading frame encoded by the same nucleotides. The evolution of such multicoding sequences is likely a response by RNA viruses to selective pressure to maximize genomic information content while maintaining small genome size. The ability to evolve such a complex genomic strategy is intimately related to the dynamics of the viral quasispecies, which allow enhanced exploration of the adaptive
Prioritizing orphan proteins for further study using phylogenomics and gene expression profiles in Streptomyces coelicolor

Directory of Open Access Journals (Sweden)

Takano Eriko

2011-09-01

Full Text Available Abstract Background Streptomyces coelicolor, a model organism of antibiotic producing bacteria, has one of the largest genomes of the bacterial kingdom, including 7825 predicted protein coding genes. A large number of these genes, nearly 34%, are functionally orphan (hypothetical proteins with unknown function. However, in gene expression time course data, many of these functionally orphan genes show interesting expression patterns. Results In this paper, we analyzed all functionally orphan genes of Streptomyces coelicolor and identified a list of "high priority" orphans by combining gene expression analysis and additional phylogenetic information (i.e. the level of evolutionary conservation of each protein. Conclusions The prioritized orphan genes are promising candidates to be examined experimentally in the lab for further characterization of their function.
Use of fluorescent proteins and color-coded imaging to visualize cancer cells with different genetic properties.

Science.gov (United States)

Hoffman, Robert M

2016-03-01

Fluorescent proteins are very bright and available in spectrally-distinct colors, enable the imaging of color-coded cancer cells growing in vivo and therefore the distinction of cancer cells with different genetic properties. Non-invasive and intravital imaging of cancer cells with fluorescent proteins allows the visualization of distinct genetic variants of cancer cells down to the cellular level in vivo. Cancer cells with increased or decreased ability to metastasize can be distinguished in vivo. Gene exchange in vivo which enables low metastatic cancer cells to convert to high metastatic can be color-coded imaged in vivo. Cancer stem-like and non-stem cells can be distinguished in vivo by color-coded imaging. These properties also demonstrate the vast superiority of imaging cancer cells in vivo with fluorescent proteins over photon counting of luciferase-labeled cancer cells.
Deep Sequencing Reveals Uncharted Isoform Heterogeneity of the Protein-Coding Transcriptome in Cerebral Ischemia.

Science.gov (United States)

Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh

2018-06-03

Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.
The small RNA content of human sperm reveals pseudogene-derived piRNAs complementary to protein-coding genes

Science.gov (United States)

Pantano, Lorena; Jodar, Meritxell; Bak, Mads; Ballescà, Josep Lluís; Tommerup, Niels; Oliva, Rafael; Vavouri, Tanya

2015-01-01

At the end of mammalian sperm development, sperm cells expel most of their cytoplasm and dispose of the majority of their RNA. Yet, hundreds of RNA molecules remain in mature sperm. The biological significance of the vast majority of these molecules is unclear. To better understand the processes that generate sperm small RNAs and what roles they may have, we sequenced and characterized the small RNA content of sperm samples from two human fertile individuals. We detected 182 microRNAs, some of which are highly abundant. The most abundant microRNA in sperm is miR-1246 with predicted targets among sperm-specific genes. The most abundant class of small noncoding RNAs in sperm are PIWI-interacting RNAs (piRNAs). Surprisingly, we found that human sperm cells contain piRNAs processed from pseudogenes. Clusters of piRNAs from human testes contain pseudogenes transcribed in the antisense strand and processed into small RNAs. Several human protein-coding genes contain antisense predicted targets of pseudogene-derived piRNAs in the male germline and these piRNAs are still found in mature sperm. Our study provides the most extensive data set and annotation of human sperm small RNAs to date and is a resource for further functional studies on the roles of sperm small RNAs. In addition, we propose that some of the pseudogene-derived human piRNAs may regulate expression of their parent gene in the male germline. PMID:25904136
Comparison of protein coding gene contents of the fungal phyla Pezizomycotina and Saccharomycotina

DEFF Research Database (Denmark)

Arvas, Mikko; Kivioja, Teemu; Mitchell, Alex

2007-01-01

Saccharomycotina are slightly better characterised and predicted to encode mainly enzymes. The genes specific to Saccharomycotina are enriched in transcription and mitochondrion related functions. Especially mitochondrial ribosomal proteins seem to have diverged from those of Pezizomycotina. In addition, we...
Inactivation of human α-globin gene expression by a de novo deletion located upstream of the α-globin gene cluster

International Nuclear Information System (INIS)

Liebhaber, S.A.; Weiss, I.; Cash, F.E.; Griese, E.U.; Horst, J.; Ayyub, H.; Higgs, D.R.

1990-01-01

Synthesis of normal human hemoglobin A, α 2 β 2 , is based upon balanced expression of genes in the α-globin gene cluster on chromosome 15 and the β-globin gene cluster on chromosome 11. Full levels of erythroid-specific activation of the β-globin cluster depend on sequences located at a considerable distance 5' to the β-globin gene, referred to as the locus-activating or dominant control region. The existence of an analogous element(s) upstream of the α-globin cluster has been suggested from observations on naturally occurring deletions and experimental studies. The authors have identified an individual with α-thalassemia in whom structurally normal α-globin genes have been inactivated in cis by a discrete de novo 35-kilobase deletion located ∼30 kilobases 5' from the α-globin gene cluster. They conclude that this deletion inactivates expression of the α-globin genes by removing one or more of the previously identified upstream regulatory sequences that are critical to expression of the α-globin genes
One, Two, Three: Polycomb Proteins Hit All Dimensions of Gene Regulation

Directory of Open Access Journals (Sweden)

Stefania del Prete

2015-07-01

Full Text Available Polycomb group (PcG proteins contribute to the formation and maintenance of a specific repressive chromatin state that prevents the expression of genes in a particular space and time. Polycomb repressive complexes (PRCs consist of several PcG proteins with specific regulatory or catalytic properties. PRCs are recruited to thousands of target genes, and various recruitment factors, including DNA-binding proteins and non-coding RNAs, are involved in the targeting. PcG proteins contribute to a multitude of biological processes by altering chromatin features at different scales. PcG proteins mediate both biochemical modifications of histone tails and biophysical modifications (e.g., chromatin fiber compaction and three-dimensional (3D chromatin conformation. Here, we review the role of PcG proteins in nuclear architecture, describing their impact on the structure of the chromatin fiber, on chromatin interactions, and on the spatial organization of the genome in nuclei. Although little is known about the role of plant PcG proteins in nuclear organization, much is known in the animal field, and we highlight similarities and differences in the roles of PcG proteins in 3D gene regulation in plants and animals.
Protein design and engineering of a de novo pathway for microbial production of 1,3-propanediol from glucose.

Science.gov (United States)

Chen, Zhen; Geng, Feng; Zeng, An-Ping

2015-02-01

Protein engineering to expand the substrate spectrum of native enzymes opens new possibilities for bioproduction of valuable chemicals from non-natural pathways. No natural microorganism can directly use sugars to produce 1,3-propanediol (PDO). Here, we present a de novo route for the biosynthesis of PDO from sugar, which may overcome the mentioned limitations by expanding the homoserine synthesis pathway. The accomplishment of pathway from homoserine to PDO is achieved by protein engineering of glutamate dehydrogenase (GDH) and pyruvate decarboxylase to sequentially convert homoserine to 4-hydroxy-2-ketobutyrate and 3-hydroxypropionaldehyde. The latter is finally converted to PDO by using a native alcohol dehydrogenase. In this work, we report on experimental accomplishment of this non-natural pathway, especially by protein engineering of GDH for the key step of converting homoserine to 4-hydroxy-2-ketobutyrate. These results show the feasibility and significance of protein engineering for de novo pathway design and overproduction of desired industrial products. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A novel TaqMan® assay for Nosema ceranae quantification in honey bee, based on the protein coding gene Hsp70.

Science.gov (United States)

Cilia, Giovanni; Cabbri, Riccardo; Maiorana, Giacomo; Cardaio, Ilaria; Dall'Olio, Raffaele; Nanetti, Antonio

2018-04-01

Nosema ceranae is now a widespread honey bee pathogen with high incidence in apiculture. Rapid and reliable detection and quantification methods are a matter of concern for research community, nowadays mainly relying on the use of biomolecular techniques such as PCR, RT-PCR or HRMA. The aim of this technical paper is to provide a new qPCR assay, based on the highly-conserved protein coding gene Hsp70, to detect and quantify the microsporidian Nosema ceranae affecting the western honey bee Apis mellifera. The validation steps to assess efficiency, sensitivity, specificity and robustness of the assay are described also. Copyright © 2018 Elsevier GmbH. All rights reserved.
The limits of de novo DNA motif discovery.

Directory of Open Access Journals (Sweden)

David Simcha

Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of
Algorithm for selection of optimized EPR distance restraints for de novo protein structure determination

Science.gov (United States)

Kazmier, Kelli; Alexander, Nathan S.; Meiler, Jens; Mchaourab, Hassane S.

2010-01-01

A hybrid protein structure determination approach combining sparse Electron Paramagnetic Resonance (EPR) distance restraints and Rosetta de novo protein folding has been previously demonstrated to yield high quality models (Alexander et al., 2008). However, widespread application of this methodology to proteins of unknown structures is hindered by the lack of a general strategy to place spin label pairs in the primary sequence. In this work, we report the development of an algorithm that optimally selects spin labeling positions for the purpose of distance measurements by EPR. For the α-helical subdomain of T4 lysozyme (T4L), simulated restraints that maximize sequence separation between the two spin labels while simultaneously ensuring pairwise connectivity of secondary structure elements yielded vastly improved models by Rosetta folding. 50% of all these models have the correct fold compared to only 21% and 8% correctly folded models when randomly placed restraints or no restraints are used, respectively. Moreover, the improvements in model quality require a limited number of optimized restraints, the number of which is determined by the pairwise connectivities of T4L α-helices. The predicted improvement in Rosetta model quality was verified by experimental determination of distances between spin labels pairs selected by the algorithm. Overall, our results reinforce the rationale for the combined use of sparse EPR distance restraints and de novo folding. By alleviating the experimental bottleneck associated with restraint selection, this algorithm sets the stage for extending computational structure determination to larger, traditionally elusive protein topologies of critical structural and biochemical importance. PMID:21074624
De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L..

Directory of Open Access Journals (Sweden)

Nan Fu

Full Text Available BACKGROUND: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. PRINCIPAL FINDINGS: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI non-redundant protein database (Nr and Swiss-Prot database respectively, and 10,473 (24.77% unigenes were assigned to Clusters of Orthologous Groups (COG. 21,126 (49.97% unigenes harboring Interpro domains were annotated, in which 15,409 (36.45% were assigned to Gene Ontology(GO categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG. Large numbers of simple sequence repeats (SSRs were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. CONCLUSIONS: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.
Complex organisation and structure of the ghrelin antisense strand gene GHRLOS, a candidate non-coding RNA gene

Directory of Open Access Journals (Sweden)

Herington Adrian C

2008-10-01

Full Text Available Abstract Background The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS, which spans the promoter and untranslated regions of the ghrelin gene (GHRL. Here we further characterise GHRLOS. Results We have described GHRLOS mRNA isoforms that extend over 1.4 kb of the promoter region and 106 nucleotides of exon 4 of the ghrelin gene, GHRL. These GHRLOS transcripts initiate 4.8 kb downstream of the terminal exon 4 of GHRL and are present in the 3' untranslated exon of the adjacent gene TATDN2 (TatD DNase domain containing 2. Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and amino-acid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis, as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed. Conclusion GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA genes, including 5' capping, polyadenylation, extensive splicing and short open reading
Catalysis by a de novo zinc-mediated protein interface: implications for natural enzyme evolution and rational enzyme engineering.

Science.gov (United States)

Der, Bryan S; Edwards, David R; Kuhlman, Brian

2012-05-08

Here we show that a recent computationally designed zinc-mediated protein interface is serendipitously capable of catalyzing carboxyester and phosphoester hydrolysis. Although the original motivation was to design a de novo zinc-mediated protein-protein interaction (called MID1-zinc), we observed in the homodimer crystal structure a small cleft and open zinc coordination site. We investigated if the cleft and zinc site at the designed interface were sufficient for formation of a primitive active site that can perform hydrolysis. MID1-zinc hydrolyzes 4-nitrophenyl acetate with a rate acceleration of 10(5) and a k(cat)/K(M) of 630 M(-1) s(-1) and 4-nitrophenyl phosphate with a rate acceleration of 10(4) and a k(cat)/K(M) of 14 M(-1) s(-1). These rate accelerations by an unoptimized active site highlight the catalytic power of zinc and suggest that the clefts formed by protein-protein interactions are well-suited for creating enzyme active sites. This discovery has implications for protein evolution and engineering: from an evolutionary perspective, three-coordinated zinc at a homodimer interface cleft represents a simple evolutionary path to nascent enzymatic activity; from a protein engineering perspective, future efforts in de novo design of enzyme active sites may benefit from exploring clefts at protein interfaces for active site placement.

De Novo GMNN Mutations Cause Autosomal-Dominant Primordial Dwarfism Associated with Meier-Gorlin Syndrome.

Science.gov (United States)

Burrage, Lindsay C; Charng, Wu-Lin; Eldomery, Mohammad K; Willer, Jason R; Davis, Erica E; Lugtenberg, Dorien; Zhu, Wenmiao; Leduc, Magalie S; Akdemir, Zeynep C; Azamian, Mahshid; Zapata, Gladys; Hernandez, Patricia P; Schoots, Jeroen; de Munnik, Sonja A; Roepman, Ronald; Pearring, Jillian N; Jhangiani, Shalini; Katsanis, Nicholas; Vissers, Lisenka E L M; Brunner, Han G; Beaudet, Arthur L; Rosenfeld, Jill A; Muzny, Donna M; Gibbs, Richard A; Eng, Christine M; Xia, Fan; Lalani, Seema R; Lupski, James R; Bongers, Ernie M H F; Yang, Yaping

2015-12-03

Meier-Gorlin syndrome (MGS) is a genetically heterogeneous primordial dwarfism syndrome known to be caused by biallelic loss-of-function mutations in one of five genes encoding pre-replication complex proteins: ORC1, ORC4, ORC6, CDT1, and CDC6. Mutations in these genes cause disruption of the origin of DNA replication initiation. To date, only an autosomal-recessive inheritance pattern has been described in individuals with this disorder, with a molecular etiology established in about three-fourths of cases. Here, we report three subjects with MGS and de novo heterozygous mutations in the 5' end of GMNN, encoding the DNA replication inhibitor geminin. We identified two truncating mutations in exon 2 (the 1(st) coding exon), c.16A>T (p.Lys6(∗)) and c.35_38delTCAA (p.Ile12Lysfs(∗)4), and one missense mutation, c.50A>G (p.Lys17Arg), affecting the second-to-last nucleotide of exon 2 and possibly RNA splicing. Geminin is present during the S, G2, and M phases of the cell cycle and is degraded during the metaphase-anaphase transition by the anaphase-promoting complex (APC), which recognizes the destruction box sequence near the 5' end of the geminin protein. All three GMNN mutations identified alter sites 5' to residue Met28 of the protein, which is located within the destruction box. We present data supporting a gain-of-function mechanism, in which the GMNN mutations result in proteins lacking the destruction box and hence increased protein stability and prolonged inhibition of replication leading to autosomal-dominant MGS. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.

Science.gov (United States)

Fu, Wenqing; O'Connor, Timothy D; Jun, Goo; Kang, Hyun Min; Abecasis, Goncalo; Leal, Suzanne M; Gabriel, Stacey; Rieder, Mark J; Altshuler, David; Shendure, Jay; Nickerson, Deborah A; Bamshad, Michael J; Akey, Joshua M

2013-01-10

Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history and will help to facilitate the development of new approaches for disease-gene discovery. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth, notable for an excess of rare genetic variants, suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European American and African American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that approximately 73% of all protein-coding SNVs and approximately 86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs than other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the Out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.
De novo transcriptome assembly of Sorghum bicolor variety Taejin

Directory of Open Access Journals (Sweden)

Yeonhwa Jo

2016-06-01

Full Text Available Sorghum (Sorghum bicolor, also known as great millet, is one of the most popular cultivated grass species in the world. Sorghum is frequently consumed as food for humans and animals as well as used for ethanol production. In this study, we conducted de novo transcriptome assembly for sorghum variety Taejin by next-generation sequencing, obtaining 8.748 GB of raw data. The raw data in this study can be available in NCBI SRA database with accession number of SRX1715644. Using the Trinity program, we identified 222,161 transcripts from sorghum variety Taejin. We further predicted coding regions within the assembled transcripts by the TransDecoder program, resulting in a total of 148,531 proteins. We carried out BLASTP against the Swiss-Prot protein sequence database to annotate the functions of the identified proteins. To our knowledge, this is the first transcriptome data for a sorghum variety derived from Korea, and it can be usefully applied to the generation of genetic markers.
Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms.

Science.gov (United States)

Mattick, John S

2003-10-01

The central dogma of biology holds that genetic information normally flows from DNA to RNA to protein. As a consequence it has been generally assumed that genes generally code for proteins, and that proteins fulfil not only most structural and catalytic but also most regulatory functions, in all cells, from microbes to mammals. However, the latter may not be the case in complex organisms. A number of startling observations about the extent of non-protein-coding RNA (ncRNA) transcription in the higher eukaryotes and the range of genetic and epigenetic phenomena that are RNA-directed suggests that the traditional view of the structure of genetic regulatory systems in animals and plants may be incorrect. ncRNA dominates the genomic output of the higher organisms and has been shown to control chromosome architecture, mRNA turnover and the developmental timing of protein expression, and may also regulate transcription and alternative splicing. This paper re-examines the available evidence and suggests a new framework for considering and understanding the genomic programming of biological complexity, autopoietic development and phenotypic variation. Copyright 2003 Wiley Periodicals, Inc.
PURA, the gene encoding Pur-alpha, member of an ancient nucleic acid-binding protein family with mammalian neurological functions.

Science.gov (United States)

Daniel, Dianne C; Johnson, Edward M

2018-02-15

The PURA gene encodes Pur-alpha, a 322 amino acid protein with repeated nucleic acid binding domains that are highly conserved from bacteria through humans. PUR genes with a single copy of this domain have been detected so far in spirochetes and bacteroides. Lower eukaryotes possess one copy of the PUR gene, whereas chordates possess 1 to 4 PUR family members. Human PUR genes encode Pur-alpha (Pura), Pur-beta (Purb) and two forms of Pur-gamma (Purg). Pur-alpha is a protein that binds specific DNA and RNA sequence elements. Human PURA, located at chromosome band 5q31, is under complex control of three promoters. The entire protein coding sequence of PURA is contiguous within a single exon. Several studies have found that overexpression or microinjection of Pura inhibits anchorage-independent growth of oncogenically transformed cells and blocks proliferation at either G1-S or G2-M checkpoints. Effects on the cell cycle may be mediated by interaction of Pura with cellular proteins including Cyclin/Cdk complexes and the Rb tumor suppressor protein. PURA knockout mice die shortly after birth with effects on brain and hematopoietic development. In humans environmentally induced heterozygous deletions of PURA have been implicated in forms of myelodysplastic syndrome and progression to acute myelogenous leukemia. Pura plays a role in AIDS through association with the HIV-1 protein, Tat. In the brain Tat and Pura association in glial cells activates transcription and replication of JC polyomavirus, the agent causing the demyelination disease, progressive multifocal leukoencephalopathy. Tat and Pura also act to stimulate replication of the HIV-1 RNA genome. In neurons Pura accompanies mRNA transcripts to sites of translation in dendrites. Microdeletions in the PURA locus have been implicated in several neurological disorders. De novo PURA mutations have been related to a spectrum of phenotypes indicating a potential PURA syndrome. The nucleic acid, G-rich Pura binding
Examining the process of de novo gene birth: an educational primer on "integration of new genes into cellular networks, and their structural maturation".

Science.gov (United States)

Frietze, Seth; Leatherman, Judith

2014-03-01

New genes that arise from modification of the noncoding portion of a genome rather than being duplicated from parent genes are called de novo genes. These genes, identified by their brief evolution and lack of parent genes, provide an opportunity to study the timeframe in which emerging genes integrate into cellular networks, and how the characteristics of these genes change as they mature into bona fide genes. An article by G. Abrusán provides an opportunity to introduce students to fundamental concepts in evolutionary and comparative genetics and to provide a technical background by which to discuss systems biology approaches when studying the evolutionary process of gene birth. Basic background needed to understand the Abrusán study and details on comparative genomic concepts tailored for a classroom discussion are provided, including discussion questions and a supplemental exercise on navigating a genome database.
Rare coding variants in PLCG2, ABI3 and TREM2 implicate microglial-mediated innate immunity in Alzheimer’s disease

Science.gov (United States)

Sims, Rebecca; van der Lee, Sven J.; Naj, Adam C.; Bellenguez, Céline; Badarinarayan, Nandini; Jakobsdottir, Johanna; Kunkle, Brian W.; Boland, Anne; Raybould, Rachel; Bis, Joshua C.; Martin, Eden R.; Grenier-Boley, Benjamin; Heilmann-Heimbach, Stefanie; Chouraki, Vincent; Kuzma, Amanda B.; Sleegers, Kristel; Vronskaya, Maria; Ruiz, Agustin; Graham, Robert R.; Olaso, Robert; Hoffmann, Per; Grove, Megan L.; Vardarajan, Badri N.; Hiltunen, Mikko; Nöthen, Markus M.; White, Charles C.; Hamilton-Nelson, Kara L.; Epelbaum, Jacques; Maier, Wolfgang; Choi, Seung-Hoan; Beecham, Gary W.; Dulary, Cécile; Herms, Stefan; Smith, Albert V.; Funk, Cory C.; Derbois, Céline; Forstner, Andreas J.; Ahmad, Shahzad; Li, Hongdong; Bacq, Delphine; Harold, Denise; Satizabal, Claudia L.; Valladares, Otto; Squassina, Alessio; Thomas, Rhodri; Brody, Jennifer A.; Qu, Liming; Sanchez-Juan, Pascual; Morgan, Taniesha; Wolters, Frank J.; Zhao, Yi; Garcia, Florentino Sanchez; Denning, Nicola; Fornage, Myriam; Malamon, John; Naranjo, Maria Candida Deniz; Majounie, Elisa; Mosley, Thomas H.; Dombroski, Beth; Wallon, David; Lupton, Michelle K; Dupuis, Josée; Whitehead, Patrice; Fratiglioni, Laura; Medway, Christopher; Jian, Xueqiu; Mukherjee, Shubhabrata; Keller, Lina; Brown, Kristelle; Lin, Honghuang; Cantwell, Laura B.; Panza, Francesco; McGuinness, Bernadette; Moreno-Grau, Sonia; Burgess, Jeremy D.; Solfrizzi, Vincenzo; Proitsi, Petra; Adams, Hieab H.; Allen, Mariet; Seripa, Davide; Pastor, Pau; Cupples, L. Adrienne; Price, Nathan D; Hannequin, Didier; Frank-García, Ana; Levy, Daniel; Chakrabarty, Paramita; Caffarra, Paolo; Giegling, Ina; Beiser, Alexa S.; Giedraitis, Vimantas; Hampel, Harald; Garcia, Melissa E.; Wang, Xue; Lannfelt, Lars; Mecocci, Patrizia; Eiriksdottir, Gudny; Crane, Paul K.; Pasquier, Florence; Boccardi, Virginia; Henández, Isabel; Barber, Robert C.; Scherer, Martin; Tarraga, Lluis; Adams, Perrie M.; Leber, Markus; Chen, Yuning; Albert, Marilyn S.; Riedel-Heller, Steffi; Emilsson, Valur; Beekly, Duane; Braae, Anne; Schmidt, Reinhold; Blacker, Deborah; Masullo, Carlo; Schmidt, Helena; Doody, Rachelle S.; Spalletta, Gianfranco; Longstreth, WT; Fairchild, Thomas J.; Bossù, Paola; Lopez, Oscar L.; Frosch, Matthew P.; Sacchinelli, Eleonora; Ghetti, Bernardino; Sánchez-Juan, Pascual; Yang, Qiong; Huebinger, Ryan M.; Jessen, Frank; Li, Shuo; Kamboh, M. Ilyas; Morris, John; Sotolongo-Grau, Oscar; Katz, Mindy J.; Corcoran, Chris; Himali, Jayanadra J.; Keene, C. Dirk; Tschanz, JoAnn; Fitzpatrick, Annette L.; Kukull, Walter A.; Norton, Maria; Aspelund, Thor; Larson, Eric B.; Munger, Ron; Rotter, Jerome I.; Lipton, Richard B.; Bullido, María J; Hofman, Albert; Montine, Thomas J.; Coto, Eliecer; Boerwinkle, Eric; Petersen, Ronald C.; Alvarez, Victoria; Rivadeneira, Fernando; Reiman, Eric M.; Gallo, Maura; O’Donnell, Christopher J.; Reisch, Joan S.; Bruni, Amalia Cecilia; Royall, Donald R.; Dichgans, Martin; Sano, Mary; Galimberti, Daniela; St George-Hyslop, Peter; Scarpini, Elio; Tsuang, Debby W.; Mancuso, Michelangelo; Bonuccelli, Ubaldo; Winslow, Ashley R.; Daniele, Antonio; Wu, Chuang-Kuo; Peters, Oliver; Nacmias, Benedetta; Riemenschneider, Matthias; Heun, Reinhard; Brayne, Carol; Rubinsztein, David C; Bras, Jose; Guerreiro, Rita; Hardy, John; Al-Chalabi, Ammar; Shaw, Christopher E; Collinge, John; Mann, David; Tsolaki, Magda; Clarimón, Jordi; Sussams, Rebecca; Lovestone, Simon; O’Donovan, Michael C; Owen, Michael J; Behrens, Timothy W.; Mead, Simon; Goate, Alison M.; Uitterlinden, Andre G.; Holmes, Clive; Cruchaga, Carlos; Ingelsson, Martin; Bennett, David A.; Powell, John; Golde, Todd E.; Graff, Caroline; De Jager, Philip L.; Morgan, Kevin; Ertekin-Taner, Nilufer; Combarros, Onofre; Psaty, Bruce M.; Passmore, Peter; Younkin, Steven G; Berr, Claudine; Gudnason, Vilmundur; Rujescu, Dan; Dickson, Dennis W.; Dartigues, Jean-Francois; DeStefano, Anita L.; Ortega-Cubero, Sara; Hakonarson, Hakon; Campion, Dominique; Boada, Merce; Kauwe, John “Keoni”; Farrer, Lindsay A.; Van Broeckhoven, Christine; Ikram, M. Arfan; Jones, Lesley; Haines, Johnathan; Tzourio, Christophe; Launer, Lenore J.; Escott-Price, Valentina; Mayeux, Richard; Deleuze, Jean-François; Amin, Najaf; Holmans, Peter A; Pericak-Vance, Margaret A.; Amouyel, Philippe; van Duijn, Cornelia M.; Ramirez, Alfredo; Wang, Li-San; Lambert, Jean-Charles; Seshadri, Sudha; Williams, Julie; Schellenberg, Gerard D.

2017-01-01

Introduction We identified rare coding variants associated with Alzheimer’s disease (AD) in a 3-stage case-control study of 85,133 subjects. In stage 1, 34,174 samples were genotyped using a whole-exome microarray. In stage 2, we tested associated variants (P<1×10-4) in 35,962 independent samples using de novo genotyping and imputed genotypes. In stage 3, an additional 14,997 samples were used to test the most significant stage 2 associations (P<5×10-8) using imputed genotypes. We observed 3 novel genome-wide significant (GWS) AD associated non-synonymous variants; a protective variant in PLCG2 (rs72824905/p.P522R, P=5.38×10-10, OR=0.68, MAFcases=0.0059, MAFcontrols=0.0093), a risk variant in ABI3 (rs616338/p.S209F, P=4.56×10-10, OR=1.43, MAFcases=0.011, MAFcontrols=0.008), and a novel GWS variant in TREM2 (rs143332484/p.R62H, P=1.55×10-14, OR=1.67, MAFcases=0.0143, MAFcontrols=0.0089), a known AD susceptibility gene. These protein-coding changes are in genes highly expressed in microglia and highlight an immune-related protein-protein interaction network enriched for previously identified AD risk genes. These genetic findings provide additional evidence that the microglia-mediated innate immune response contributes directly to AD development. PMID:28714976
Novel polymorphisms in UTR and coding region of inducible heat shock protein 70.1 gene in tropically adapted Indian zebu cattle (Bos indicus) and riverine buffalo (Bubalus bubalis).

Science.gov (United States)

Sodhi, M; Mukesh, M; Kishore, A; Mishra, B P; Kataria, R S; Joshi, B K

2013-09-25

Due to evolutionary divergence, cattle (taurine, and indicine) and buffalo are speculated to have different responses to heat stress condition. Variation in candidate genes associated with a heat-shock response may provide an insight into the dissimilarity and suggest targets for intervention. The present work was undertaken to characterize one of the inducible heat shock protein genes promoter and coding regions in diverse breeds of Indian zebu cattle and buffaloes. The genomic DNA from a panel of 117 unrelated animals representing 14 diversified native cattle breeds and 6 buffalo breeds were utilized to determine the complete sequence and gene diversity of HSP70.1 gene. The coding region of HSP70.1 gene in Indian zebu cattle, Bos taurus and buffalo was similar in length (1,926 bp) encoding a HSP70 protein of 641 amino acids with a calculated molecular weight (Mw) of 70.26 kDa. However buffalo had a longer 5' and 3' untranslated region (UTR) of 204 and 293 nucleotides respectively, in comparison to Indian zebu cattle and Bos taurus wherein length of 5' and 3'-UTR was 172 and 286 nucleotides, respectively. The increased length of buffalo HSP70.1 gene compared to indicine and taurine gene was due to two insertions each in 5' and 3'-UTR. Comparative sequence analysis of cattle (taurine and indicine) and buffalo HSP70.1 gene revealed a total of 54 gene variations (50 SNPs and 4 INDELs) among the three species in the HSP70.1 gene. The minor allele frequencies of these nucleotide variations varied from 0.03 to 0.5 with an average of 0.26. Among the 14 B. indicus cattle breeds studied, a total of 19 polymorphic sites were identified: 4 in the 5'-UTR and 15 in the coding region (of these 2 were non-synonymous). Analysis among buffalo breeds revealed 15 SNPs throughout the gene: 6 at the 5' flanking region and 9 in the coding region. In bubaline 5'-UTR, 2 additional putative transcription factor binding sites (Elk-1 and C-Re1) were identified, other than three common sites
Genetic coding and gene expression - new Quadruplet genetic coding model

Science.gov (United States)

Shankar Singh, Rama

2012-07-01

Successful demonstration of human genome project has opened the door not only for developing personalized medicine and cure for genetic diseases, but it may also answer the complex and difficult question of the origin of life. It may lead to making 21st century, a century of Biological Sciences as well. Based on the central dogma of Biology, genetic codons in conjunction with tRNA play a key role in translating the RNA bases forming sequence of amino acids leading to a synthesized protein. This is the most critical step in synthesizing the right protein needed for personalized medicine and curing genetic diseases. So far, only triplet codons involving three bases of RNA, transcribed from DNA bases, have been used. Since this approach has several inconsistencies and limitations, even the promise of personalized medicine has not been realized. The new Quadruplet genetic coding model proposed and developed here involves all four RNA bases which in conjunction with tRNA will synthesize the right protein. The transcription and translation process used will be the same, but the Quadruplet codons will help overcome most of the inconsistencies and limitations of the triplet codes. Details of this new Quadruplet genetic coding model and its subsequent potential applications including relevance to the origin of life will be presented.
Novel variant in the TP63 gene associated to ankyloblepharon-ectodermal dysplasia-cleft lip/palate (AEC) syndrome.

Science.gov (United States)

Gonzalez, Francisco; Loidi, Lourdes; Abalo-Lojo, Jose M

2017-01-01

Ankyloblepharon-ectodermal dysplasia-cleft lip/palate (AEC) syndrome is a disorder resulting from anomalous embryonic development of ectodermal tissues. There is evidence that AEC syndrome is caused by mutations in the TP63 gene, which encodes the p63 protein. This is an important regulatory protein involved in epidermal proliferation and differentiation. Genome sequencing was performed in DNA from peripheral blood leukocytes of a newborn with AEC syndrome and her parents. Variants were searched in all coding exons and intron-exon boundaries of the TP63 gene. A heterozygous missense variant (NM_003722.4:c.1063G>C (p.Asp355His) was found in the newborn patient. No variants were found in either of the parents. We identified a previously unreported variant in TP63 gene which seems to be involved in the somatic malformations found in the AEC syndrome. The absence of this variant in both parents suggests that the variant appeared de novo.
Non-Coding RNAs in Arabidopsis

DEFF Research Database (Denmark)

van Wonterghem, Miranda

This work evolves around elucidating the mechanisms of micro RNAs (miRNAs) in Arabidopsis thaliana. I identified a new class of nuclear non-coding RNAs derived from protein coding genes. The genes are miRNA targets with extensive gene body methylation. The RNA species are nuclear localized and de...
De Novo Mutations in CHD4, an ATP-Dependent Chromatin Remodeler Gene, Cause an Intellectual Disability Syndrome with Distinctive Dysmorphisms.

Science.gov (United States)

Weiss, Karin; Terhal, Paulien A; Cohen, Lior; Bruccoleri, Michael; Irving, Melita; Martinez, Ariel F; Rosenfeld, Jill A; Machol, Keren; Yang, Yaping; Liu, Pengfei; Walkiewicz, Magdalena; Beuten, Joke; Gomez-Ospina, Natalia; Haude, Katrina; Fong, Chin-To; Enns, Gregory M; Bernstein, Jonathan A; Fan, Judith; Gotway, Garrett; Ghorbani, Mohammad; van Gassen, Koen; Monroe, Glen R; van Haaften, Gijs; Basel-Vanagaite, Lina; Yang, Xiang-Jiao; Campeau, Philippe M; Muenke, Maximilian

2016-10-06

Chromodomain helicase DNA-binding protein 4 (CHD4) is an ATP-dependent chromatin remodeler involved in epigenetic regulation of gene transcription, DNA repair, and cell cycle progression. Also known as Mi2β, CHD4 is an integral subunit of a well-characterized histone deacetylase complex. Here we report five individuals with de novo missense substitutions in CHD4 identified through whole-exome sequencing and web-based gene matching. These individuals have overlapping phenotypes including developmental delay, intellectual disability, hearing loss, macrocephaly, distinct facial dysmorphisms, palatal abnormalities, ventriculomegaly, and hypogonadism as well as additional findings such as bone fusions. The variants, c.3380G>A (p.Arg1127Gln), c.3443G>T (p.Trp1148Leu), c.3518G>T (p.Arg1173Leu), and c.3008G>A, (p.Gly1003Asp) (GenBank: NM_001273.3), affect evolutionarily highly conserved residues and are predicted to be deleterious. Previous studies in yeast showed the equivalent Arg1127 and Trp1148 residues to be crucial for SNF2 function. Furthermore, mutations in the same positions were reported in malignant tumors, and a de novo missense substitution in an equivalent arginine residue in the C-terminal helicase domain of SMARCA4 is associated with Coffin Siris syndrome. Cell-based studies of the p.Arg1127Gln and p.Arg1173Leu mutants demonstrate normal localization to the nucleus and HDAC1 interaction. Based on these findings, the mutations potentially alter the complex activity but not its formation. This report provides evidence for the role of CHD4 in human development and expands an increasingly recognized group of Mendelian disorders involving chromatin remodeling and modification. Published by Elsevier Inc.
Functional Diets Modulate lncRNA-Coding RNAs and Gene Interactions in the Intestine of Rainbow Trout Oncorhynchus mykiss.

Science.gov (United States)

Núñez-Acuña, Gustavo; Détrée, Camille; Gallardo-Escárate, Cristian; Gonçalves, Ana Teresa

2017-06-01

The advent of functional genomics has sparked the interest in inferring the function of non-coding regions from the transcriptome in non-model species. However, numerous biological processes remain understudied from this perspective, including intestinal immunity in farmed fish. The aim of this study was to infer long non-coding RNA (lncRNAs) expression profiles in rainbow trout (Oncorhynchus mykiss) fed for 30 days with functional diets based on pre- and probiotics. For this, whole transcriptome sequencing was conducted through Illumina technology, and lncRNAs were mined to evaluate transcriptional activity in conjunction with known protein sequences. To detect differentially expressed transcripts, 880 novels and 9067 previously described O. mykiss lncRNAs were used. Expression levels and genome co-localization correlations with coding genes were also analyzed. Significant differences in gene expression were primarily found in the probiotic diet, which had a twofold downregulation of lncRNAs compared to other treatments. Notable differences by diet were also evidenced between the coding genes of distinct metabolic processes. In contrast, genome co-localization of lncRNAs with coding genes was similar for all diets. This study contributes novel knowledge regarding lncRNAs in fish, suggesting key roles in salmons fed with in-feed additives with the capacity to modulate the intestinal homeostasis and host health.
Assembly of the Boechera retrofracta Genome and Evolutionary Analysis of Apomixis-Associated Genes

Directory of Open Access Journals (Sweden)

Sergei Kliver

2018-03-01

Full Text Available Closely related to the model plant Arabidopsis thaliana, the genus Boechera is known to contain both sexual and apomictic species or accessions. Boechera retrofracta is a diploid sexually reproducing species and is thought to be an ancestral parent species of apomictic species. Here we report the de novo assembly of the B. retrofracta genome using short Illumina and Roche reads from 1 paired-end and 3 mate pair libraries. The distribution of 23-mers from the paired end library has indicated a low level of heterozygosity and the presence of detectable duplications and triplications. The genome size was estimated to be equal 227 Mb. N50 of the assembled scaffolds was 2.3 Mb. Using a hybrid approach that combines homology-based and de novo methods 27,048 protein-coding genes were predicted. Also repeats, transfer RNA (tRNA and ribosomal RNA (rRNA genes were annotated. Finally, genes of B. retrofracta and 6 other Brassicaceae species were used for phylogenetic tree reconstruction. In addition, we explored the histidine exonuclease APOLLO locus, related to apomixis in Boechera, and proposed model of its evolution through the series of duplications. An assembled genome of B. retrofracta will help in the challenging assembly of the highly heterozygous genomes of hybrid apomictic species.
A de novo variant in the ASPRV1 gene in a dog with ichthyosis.

Science.gov (United States)

Bauer, Anina; Waluk, Dominik P; Galichet, Arnaud; Timm, Katrin; Jagannathan, Vidhya; Sayar, Beyza S; Wiener, Dominique J; Dietschi, Elisabeth; Müller, Eliane J; Roosje, Petra; Welle, Monika M; Leeb, Tosso

2017-03-01

Ichthyoses are a heterogeneous group of inherited cornification disorders characterized by generalized dry skin, scaling and/or hyperkeratosis. Ichthyosis vulgaris is the most common form of ichthyosis in humans and caused by genetic variants in the FLG gene encoding filaggrin. Filaggrin is a key player in the formation of the stratum corneum, the uppermost layer of the epidermis and therefore crucial for barrier function. During terminal differentiation of keratinocytes, the precursor profilaggrin is cleaved by several proteases into filaggrin monomers and eventually processed into free amino acids contributing to the hydration of the cornified layer. We studied a German Shepherd dog with a novel form of ichthyosis. Comparing the genome sequence of the affected dog with 288 genomes from genetically diverse non-affected dogs we identified a private heterozygous variant in the ASPRV1 gene encoding "aspartic peptidase, retroviral-like 1", which is also known as skin aspartic protease (SASPase). The variant was absent in both parents and therefore due to a de novo mutation event. It was a missense variant, c.1052T>C, affecting a conserved residue close to an autoprocessing cleavage site, p.(Leu351Pro). ASPRV1 encodes a retroviral-like protease involved in profilaggrin-to-filaggrin processing. By immunofluorescence staining we showed that the filaggrin expression pattern was altered in the affected dog. Thus, our findings provide strong evidence that the identified de novo variant is causative for the ichthyosis in the affected dog and that ASPRV1 plays an essential role in skin barrier formation. ASPRV1 is thus a novel candidate gene for unexplained human forms of ichthyoses.
A de novo variant in the ASPRV1 gene in a dog with ichthyosis.

Directory of Open Access Journals (Sweden)

Anina Bauer

2017-03-01

Full Text Available Ichthyoses are a heterogeneous group of inherited cornification disorders characterized by generalized dry skin, scaling and/or hyperkeratosis. Ichthyosis vulgaris is the most common form of ichthyosis in humans and caused by genetic variants in the FLG gene encoding filaggrin. Filaggrin is a key player in the formation of the stratum corneum, the uppermost layer of the epidermis and therefore crucial for barrier function. During terminal differentiation of keratinocytes, the precursor profilaggrin is cleaved by several proteases into filaggrin monomers and eventually processed into free amino acids contributing to the hydration of the cornified layer. We studied a German Shepherd dog with a novel form of ichthyosis. Comparing the genome sequence of the affected dog with 288 genomes from genetically diverse non-affected dogs we identified a private heterozygous variant in the ASPRV1 gene encoding "aspartic peptidase, retroviral-like 1", which is also known as skin aspartic protease (SASPase. The variant was absent in both parents and therefore due to a de novo mutation event. It was a missense variant, c.1052T>C, affecting a conserved residue close to an autoprocessing cleavage site, p.(Leu351Pro. ASPRV1 encodes a retroviral-like protease involved in profilaggrin-to-filaggrin processing. By immunofluorescence staining we showed that the filaggrin expression pattern was altered in the affected dog. Thus, our findings provide strong evidence that the identified de novo variant is causative for the ichthyosis in the affected dog and that ASPRV1 plays an essential role in skin barrier formation. ASPRV1 is thus a novel candidate gene for unexplained human forms of ichthyoses.
Coxiella burnetii Nine Mile II proteins modulate gene expression of monocytic host cells during infection

Directory of Open Access Journals (Sweden)

Shaw Edward I

2010-09-01

Full Text Available Abstract Background Coxiella burnetii is an intracellular bacterial pathogen that causes acute and chronic disease in humans. Bacterial replication occurs within enlarged parasitophorous vacuoles (PV of eukaryotic cells, the biogenesis and maintenance of which is dependent on C. burnetii protein synthesis. These observations suggest that C. burnetii actively subverts host cell processes, however little is known about the cellular biology mechanisms manipulated by the pathogen during infection. Here, we examined host cell gene expression changes specifically induced by C. burnetii proteins during infection. Results We have identified 36 host cell genes that are specifically regulated when de novo C. burnetii protein synthesis occurs during infection using comparative microarray analysis. Two parallel sets of infected and uninfected THP-1 cells were grown for 48 h followed by the addition of chloramphenicol (CAM to 10 μg/ml in one set. Total RNA was harvested at 72 hpi from all conditions, and microarrays performed using Phalanx Human OneArray™ slides. A total of 784 (mock treated and 901 (CAM treated THP-1 genes were up or down regulated ≥2 fold in the C. burnetii infected vs. uninfected cell sets, respectively. Comparisons between the complementary data sets (using >0 fold, eliminated the common gene expression changes. A stringent comparison (≥2 fold between the separate microarrays revealed 36 host cell genes modulated by C. burnetii protein synthesis. Ontological analysis of these genes identified the innate immune response, cell death and proliferation, vesicle trafficking and development, lipid homeostasis, and cytoskeletal organization as predominant cellular functions modulated by C. burnetii protein synthesis. Conclusions Collectively, these data indicate that C. burnetii proteins actively regulate the expression of specific host cell genes and pathways. This is in addition to host cell genes that respond to the presence of the
Expression of genes encoding multi-transmembrane proteins in specific primate taste cell populations.

Directory of Open Access Journals (Sweden)

Bryan D Moyer

Full Text Available BACKGROUND: Using fungiform (FG and circumvallate (CV taste buds isolated by laser capture microdissection and analyzed using gene arrays, we previously constructed a comprehensive database of gene expression in primates, which revealed over 2,300 taste bud-associated genes. Bioinformatics analyses identified hundreds of genes predicted to encode multi-transmembrane domain proteins with no previous association with taste function. A first step in elucidating the roles these gene products play in gustation is to identify the specific taste cell types in which they are expressed. METHODOLOGY/PRINCIPAL FINDINGS: Using double label in situ hybridization analyses, we identified seven new genes expressed in specific taste cell types, including sweet, bitter, and umami cells (TRPM5-positive, sour cells (PKD2L1-positive, as well as other taste cell populations. Transmembrane protein 44 (TMEM44, a protein with seven predicted transmembrane domains with no homology to GPCRs, is expressed in a TRPM5-negative and PKD2L1-negative population that is enriched in the bottom portion of taste buds and may represent developmentally immature taste cells. Calcium homeostasis modulator 1 (CALHM1, a component of a novel calcium channel, along with family members CALHM2 and CALHM3; multiple C2 domains; transmembrane 1 (MCTP1, a calcium-binding transmembrane protein; and anoctamin 7 (ANO7, a member of the recently identified calcium-gated chloride channel family, are all expressed in TRPM5 cells. These proteins may modulate and effect calcium signalling stemming from sweet, bitter, and umami receptor activation. Synaptic vesicle glycoprotein 2B (SV2B, a regulator of synaptic vesicle exocytosis, is expressed in PKD2L1 cells, suggesting that this taste cell population transmits tastant information to gustatory afferent nerve fibers via exocytic neurotransmitter release. CONCLUSIONS/SIGNIFICANCE: Identification of genes encoding multi-transmembrane domain proteins
Developing a de novo targeted knock-in method based on in utero electroporation into the mammalian brain.

Science.gov (United States)

Tsunekawa, Yuji; Terhune, Raymond Kunikane; Fujita, Ikumi; Shitamukai, Atsunori; Suetsugu, Taeko; Matsuzaki, Fumio

2016-09-01

Genome-editing technology has revolutionized the field of biology. Here, we report a novel de novo gene-targeting method mediated by in utero electroporation into the developing mammalian brain. Electroporation of donor DNA with the CRISPR/Cas9 system vectors successfully leads to knock-in of the donor sequence, such as EGFP, to the target site via the homology-directed repair mechanism. We developed a targeting vector system optimized to prevent anomalous leaky expression of the donor gene from the plasmid, which otherwise often occurs depending on the donor sequence. The knock-in efficiency of the electroporated progenitors reached up to 40% in the early stage and 20% in the late stage of the developing mouse brain. Furthermore, we inserted different fluorescent markers into the target gene in each homologous chromosome, successfully distinguishing homozygous knock-in cells by color. We also applied this de novo gene targeting to the ferret model for the study of complex mammalian brains. Our results demonstrate that this technique is widely applicable for monitoring gene expression, visualizing protein localization, lineage analysis and gene knockout, all at the single-cell level, in developmental tissues. © 2016. Published by The Company of Biologists Ltd.
Organization of the gene coding for human protein C inhibitor (plasminogen activator inhibitor-3). Assignment of the gene to chromosome 14

NARCIS (Netherlands)

Meijers, J. C.; Chung, D. W.

1991-01-01

Protein C inhibitor (plasminogen activator inhibitor-3) is a plasma glycoprotein and a member of the serine proteinase inhibitor superfamily. In the present study, the human gene for protein C inhibitor was isolated and characterized from three independent phage that contained overlapping inserts

The impact of intragenic CpG content on gene expression.

Science.gov (United States)

Bauer, Asli Petra; Leikam, Doris; Krinner, Simone; Notka, Frank; Ludwig, Christine; Längst, Gernot; Wagner, Ralf

2010-07-01

The development of vaccine components or recombinant therapeutics critically depends on sustained expression of the corresponding transgene. This study aimed to determine the contribution of intragenic CpG content to expression efficiency in transiently and stably transfected mammalian cells. Based upon a humanized version of green fluorescent protein (GFP) containing 60 CpGs within its coding sequence, a CpG-depleted variant of the GFP reporter was established by carefully modulating the codon usage. Interestingly, GFP reporter activity and detectable protein amounts in stably transfected CHO and 293 cells were significantly decreased upon CpG depletion and independent from promoter usage (CMV, EF1 alpha). The reduction in protein expression associated with CpG depletion was likewise observed for other unrelated reporter genes and was clearly reflected by a decline in mRNA copy numbers rather than translational efficiency. Moreover, decreased mRNA levels were neither due to nuclear export restrictions nor alternative splicing or mRNA instability. Rather, the intragenic CpG content influenced de novo transcriptional activity thus implying a common transcription-based mechanism of gene regulation via CpGs. Increased high CpG transcription correlated with changed nucleosomal positions in vitro albeit histone density at the two genes did not change in vivo as monitored by ChIP.
Computational Approaches Reveal New Insights into Regulation and Function of Non; coding RNAs and their Targets

KAUST Repository

Alam, Tanvir

2016-01-01

Regulation and function of protein-coding genes are increasingly well-understood, but no comparable evidence exists for non-coding RNA (ncRNA) genes, which appear to be more numerous than protein-coding genes. We developed a novel machine
Amino acid code of protein secondary structure.

Science.gov (United States)

Shestopalov, B V

2003-01-01

The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three
A Pectate Lyase-Coding Gene Abundantly Expressed during Early Stages of Infection Is Required for Full Virulence in Alternaria brassicicola.

Directory of Open Access Journals (Sweden)

Yangrae Cho

Full Text Available Alternaria brassicicola causes black spot disease of Brassica species. The functional importance of pectin digestion enzymes and unidentified phytotoxins in fungal pathogenesis has been suspected but not verified in A. brassicicola. The fungal transcription factor AbPf2 is essential for pathogenicity and induces 106 genes during early pathogenesis, including the pectate lyase-coding gene, PL1332. The aim of this study was to test the importance and roles of PL1332 in pathogenesis. We generated deletion strains of the PL1332 gene, produced heterologous PL1332 proteins, and evaluated their association with virulence. Deletion strains of the PL1332 gene were approximately 30% less virulent than wild-type A. brassicicola, without showing differences in colony expansion on solid media and mycelial growth in nutrient-rich liquid media or minimal media with pectins as a major carbon source. Heterologous PL1332 expressed as fusion proteins digested polygalacturons in vitro. When the fusion proteins were injected into the apoplast between leaf veins of host plants the tissues turned dark brown and soft, resembling necrotic leaf tissue. The PL1332 gene was the first example identified as a general toxin-coding gene and virulence factor among the 106 genes regulated by the transcription factor, AbPf2. It was also the first gene to have its functions investigated among the 19 pectate lyase genes and several hundred putative cell-wall degrading enzymes in A. brassicicola. These results further support the importance of the AbPf2 gene as a key pathogenesis regulator and possible target for agrochemical development.
Differential requirement of de novo Arc protein synthesis in the insular cortex and the amygdala for safe and aversive taste long-term memory formation.

Science.gov (United States)

Guzmán-Ramos, Kioko; Venkataraman, Archana; Morin, Jean-Pascal; Osorio-Gómez, Daniel; Bermúdez-Rattoni, Federico

2018-04-16

Several immediate early genes products are known to be involved in the facilitation of structural and functional modifications at distinct synapses activated through experience. The IEG-encoded protein Arc (activity regulated cytoskeletal-associated protein) has been widely implicated in long-term memory formation and stabilization. In this study, we sought to evaluate a possible role for de novo Arc protein synthesis in the insular cortex (IC) and in the amygdala (AMY) during long-term taste memory formation. We found that acute inhibition of Arc protein synthesis through the infusion of antisense oligonucleotides administered in the IC before a novel taste presentation, affected consolidation of a safe taste memory trace (ST) but spared consolidation of conditioned taste aversion (CTA). Conversely, blocking Arc synthesis within the AMY impaired CTA consolidation but had no effect on ST long-term memory formation. Our results suggest that Arc-dependent plasticity during taste learning is required within distinct structures of the medial temporal lobe, depending on the emotional valence of the memory trace. Copyright © 2018 Elsevier B.V. All rights reserved.
Discovery of Proteomic Code with mRNA Assisted Protein Folding

Directory of Open Access Journals (Sweden)

Jan C. Biro

2008-12-01

Full Text Available The 3x redundancy of the Genetic Code is usually explained as a necessity to increase the mutation-resistance of the genetic information. However recent bioinformatical observations indicate that the redundant Genetic Code contains more biological information than previously known and which is additional to the 64/20 definition of amino acids. It might define the physico-chemical and structural properties of amino acids, the codon boundaries, the amino acid co-locations (interactions in the coded proteins and the free folding energy of mRNAs. This additional information, which seems to be necessary to determine the 3D structure of coding nucleic acids as well as the coded proteins, is known as the Proteomic Code and mRNA Assisted Protein Folding.
Analysis of insecticide resistance-related genes of the Carmine spider mite Tetranychus cinnabarinus based on a de novo assembled transcriptome.

Directory of Open Access Journals (Sweden)

Zhifeng Xu

Full Text Available The carmine spider mite (CSM, Tetranychus cinnabarinus, is an important pest mite in agriculture, because it can develop insecticide resistance easily. To gain valuable gene information and molecular basis for the future insecticide resistance study of CSM, the first transcriptome analysis of CSM was conducted. A total of 45,016 contigs and 25,519 unigenes were generated from the de novo transcriptome assembly, and 15,167 unigenes were annotated via BLAST querying against current databases, including nr, SwissProt, the Clusters of Orthologous Groups (COGs, Kyoto Encyclopedia of Genes and Genomes (KEGG and Gene Ontology (GO. Aligning the transcript to Tetranychus urticae genome, the 19255 (75.45% of the transcripts had significant (e-value <10-5 matches to T. urticae DNA genome, 19111 sequences matched to T. urticae proteome with an average protein length coverage of 42.55%. Core Eukaryotic Genes Mapping Approach (CEGMA analysis identified 435 core eukaryotic genes (CEGs in the CSM dataset corresponding to 95% coverage. Ten gene categories that relate to insecticide resistance in arthropod were generated from CSM transcriptome, including 53 P450-, 22 GSTs-, 23 CarEs-, 1 AChE-, 7 GluCls-, 9 nAChRs-, 8 GABA receptor-, 1 sodium channel-, 6 ATPase- and 12 Cyt b genes. We developed significant molecular resources for T. cinnabarinus putatively involved in insecticide resistance. The transcriptome assembly analysis will significantly facilitate our study on the mechanism of adapting environmental stress (including insecticide in CSM at the molecular level, and will be very important for developing new control strategies against this pest mite.
A global analysis of protein expression profiles in Sinorhizobium meliloti: discovery of new genes for nodule occupancy and stress adaptation.

Science.gov (United States)

Djordjevic, Michael A; Chen, Han Cai; Natera, Siria; Van Noorden, Giel; Menzel, Christian; Taylor, Scott; Renard, Clotilde; Geiger, Otto; Weiller, Georg F

2003-06-01

A proteomic examination of Sinorhizobium meliloti strain 1021 was undertaken using a combination of 2-D gel electrophoresis, peptide mass fingerprinting, and bioinformatics. Our goal was to identify (i) putative symbiosis- or nutrient-stress-specific proteins, (ii) the biochemical pathways active under different conditions, (iii) potential new genes, and (iv) the extent of posttranslational modifications of S. meliloti proteins. In total, we identified the protein products of 810 genes (13.1% of the genome's coding capacity). The 810 genes generated 1,180 gene products, with chromosomal genes accounting for 78% of the gene products identified (18.8% of the chromosome's coding capacity). The activity of 53 metabolic pathways was inferred from bioinformatic analysis of proteins with assigned Enzyme Commission numbers. Of the remaining proteins that did not encode enzymes, ABC-type transporters composed 12.7% and regulatory proteins 3.4% of the total. Proteins with up to seven transmembrane domains were identified in membrane preparations. A total of 27 putative nodule-specific proteins and 35 nutrient-stress-specific proteins were identified and used as a basis to define genes and describe processes occurring in S. meliloti cells in nodules and under stress. Several nodule proteins from the plant host were present in the nodule bacteria preparations. We also identified seven potentially novel proteins not predicted from the DNA sequence. Post-translational modifications such as N-terminal processing could be inferred from the data. The posttranslational addition of UMP to the key regulator of nitrogen metabolism, PII, was demonstrated. This work demonstrates the utility of combining mass spectrometry with protein arraying or separation techniques to identify candidate genes involved in important biological processes and niche occupations that may be intransigent to other methods of gene expression profiling.
Nucleotide sequence of the melA gene, coding for alpha-galactosidase in Escherichia coli K-12.

OpenAIRE

Liljeström, P L; Liljeström, P

1987-01-01

Melibiose uptake and hydrolysis in E.coli is performed by the MelB and MelA proteins, respectively. We report the cloning and sequencing of the melA gene. The nucleotide sequence data showed that melA codes for a 450 amino acid long protein with a molecular weight of 50.6 kd. The sequence data also supported the assumption that the mel locus forms an operon with melA in proximal position. A comparison of MelA with alpha-galactosidase proteins from yeast and human origin showed that these prot...
Combinatorial Control of mRNA Fates by RNA-Binding Proteins and Non-Coding RNAs

Directory of Open Access Journals (Sweden)

Valentina Iadevaia

2015-09-01

Full Text Available Post-transcriptional control of gene expression is mediated by RNA-binding proteins (RBPs and small non-coding RNAs (e.g., microRNAs that bind to distinct elements in their mRNA targets. Here, we review recent examples describing the synergistic and/or antagonistic effects mediated by RBPs and miRNAs to determine the localisation, stability and translation of mRNAs in mammalian cells. From these studies, it is becoming increasingly apparent that dynamic rearrangements of RNA-protein complexes could have profound implications in human cancer, in synaptic plasticity, and in cellular differentiation.
Extracellular Hsp90 serves as a co-factor for MAPK activation and latent viral gene expression during de novo infection by KSHV

International Nuclear Information System (INIS)

Qin Zhiqiang; DeFee, Michael; Isaacs, Jennifer S.; Parsons, Chris

2010-01-01

The Kaposi's sarcoma-associated herpesvirus (KSHV) is the causative agent of Kaposi's sarcoma (KS), an important cause of morbidity and mortality in immunocompromised patients. KSHV interaction with the cell membrane triggers activation of specific intracellular signal transduction pathways to facilitate virus entry, nuclear trafficking, and ultimately viral oncogene expression. Extracellular heat shock protein 90 localizes to the cell surface (csHsp90) and facilitates signal transduction in cancer cell lines, but whether csHsp90 assists in the coordination of KSHV gene expression through these or other mechanisms is unknown. Using a recently characterized non-permeable inhibitor specifically targeting csHsp90 and Hsp90-specific antibodies, we show that csHsp90 inhibition suppresses KSHV gene expression during de novo infection, and that this effect is mediated largely through the inhibition of mitogen-activated protein kinase (MAPK) activation by KSHV. Moreover, we show that targeting csHsp90 reduces constitutive MAPK expression and the release of infectious viral particles by patient-derived, KSHV-infected primary effusion lymphoma cells. These data suggest that csHsp90 serves as an important co-factor for KSHV-initiated MAPK activation and provide proof-of-concept for the potential benefit of targeting csHsp90 for the treatment or prevention of KSHV-associated illnesses.
Combined protein construct and synthetic gene engineering for heterologous protein expression and crystallization using Gene Composer

Directory of Open Access Journals (Sweden)

Walchli John

2009-04-01

Full Text Available Abstract Background With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software package Gene Composer. This freely available electronic tool facilitates the information-rich design of protein constructs and their engineered synthetic gene sequences, as detailed in the accompanying manuscript. Results In this report, we compare heterologous protein expression levels from native sequences to that of codon engineered synthetic gene constructs designed by Gene Composer. A test set of proteins including a human kinase (P38α, viral polymerase (HCV NS5B, and bacterial structural protein (FtsZ were expressed in both E. coli and a cell-free wheat germ translation system. We also compare the protein expression levels in E. coli for a set of 11 different proteins with greatly varied G:C content and codon bias. Conclusion The results consistently demonstrate that protein yields from codon engineered Gene Composer designs are as good as or better than those achieved from the synonymous native genes. Moreover, structure guided N- and C-terminal deletion constructs designed with the aid of Gene Composer can lead to greater success in gene to structure work as exemplified by the X-ray crystallographic structure determination of FtsZ from Bacillus subtilis. These results validate the Gene Composer algorithms, and suggest that using a combination of synthetic gene and protein construct engineering tools can improve the economics of gene to structure research.
MIWI2 as an Effector of DNA Methylation and Gene Silencing in Embryonic Male Germ Cells

Directory of Open Access Journals (Sweden)

Kanako Kojima-Kita

2016-09-01

Full Text Available During the development of mammalian embryonic germ cells, global demethylation and de novo DNA methylation take place. In mouse embryonic germ cells, two PIWI family proteins, MILI and MIWI2, are essential for the de novo DNA methylation of retrotransposons, presumably through PIWI-interacting RNAs (piRNAs. Although piRNA-associated MIWI2 has been reported to play critical roles in the process, its molecular mechanisms have remained unclear. To identify the mechanism, transgenic mice were produced; they contained a fusion protein of MIWI2 and a zinc finger (ZF that recognized the promoter region of a type A LINE-1 gene. The ZF-MIWI2 fusion protein brought about DNA methylation, suppression of the type A LINE-1 gene, and a partial rescue of the impaired spermatogenesis of MILI-null mice. In addition, ZF-MIWI2 was associated with the proteins involved in DNA methylation. These data indicate that MIWI2 functions as an effector of de novo DNA methylation of the retrotransposon.
IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction

Directory of Open Access Journals (Sweden)

Kiran Sree Pokkuluri

2014-01-01

Full Text Available Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000. The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata and MCC (modified clonal classifier to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992 datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006 dataset and nonpromoters from EID (Saxonov et al., 2000 and UTRdb (Pesole et al., 2002 datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively.
The CUP2 gene product regulates the expression of the CUP1 gene, coding for yeast metallothionein.

OpenAIRE

Welch, J; Fogel, S; Buchman, C; Karin, M

1989-01-01

The yeast CUP1 gene codes for a copper-binding protein similar to metallothionein. Copper sensitive cup1s strains contain a single copy of the CUP1 locus. Resistant strains (CUP1r) carry 12 or more multiple tandem copies. We isolated 12 ethyl methane sulfonate-induced copper sensitive mutants in a wild-type CUP1r parental strain, X2180-1A. Most mutants reduce the copper resistance phenotype only slightly. However, the mutant cup2 lowers resistance by nearly two orders of magnitude. We cloned ...
Gene-Auto: Automatic Software Code Generation for Real-Time Embedded Systems

Science.gov (United States)

Rugina, A.-E.; Thomas, D.; Olive, X.; Veran, G.

2008-08-01

This paper gives an overview of the Gene-Auto ITEA European project, which aims at building a qualified C code generator from mathematical models under Matlab-Simulink and Scilab-Scicos. The project is driven by major European industry partners, active in the real-time embedded systems domains. The Gene- Auto code generator will significantly improve the current development processes in such domains by shortening the time to market and by guaranteeing the quality of the generated code through the use of formal methods. The first version of the Gene-Auto code generator has already been released and has gone thought a validation phase on real-life case studies defined by each project partner. The validation results are taken into account in the implementation of the second version of the code generator. The partners aim at introducing the Gene-Auto results into industrial development by 2010.
Phylogenetic relationships within Echinococcus and Taenia tapeworms (Cestoda: Taeniidae): an inference from nuclear protein-coding genes.

Science.gov (United States)

Knapp, Jenny; Nakao, Minoru; Yanagida, Tetsuya; Okamoto, Munehiro; Saarma, Urmas; Lavikainen, Antti; Ito, Akira

2011-12-01

The family Taeniidae of tapeworms is composed of two genera, Echinococcus and Taenia, which obligately parasitize mammals including humans. Inferring phylogeny via molecular markers is the only way to trace back their evolutionary histories. However, molecular dating approaches are lacking so far. Here we established new markers from nuclear protein-coding genes for RNA polymerase II second largest subunit (rpb2), phosphoenolpyruvate carboxykinase (pepck) and DNA polymerase delta (pold). Bayesian inference and maximum likelihood analyses of the concatenated gene sequences allowed us to reconstruct phylogenetic trees for taeniid parasites. The tree topologies clearly demonstrated that Taenia is paraphyletic and that the clade of Echinococcus oligarthrus and Echinococcusvogeli is sister to all other members of Echinococcus. Both species are endemic in Central and South America, and their definitive hosts originated from carnivores that immigrated from North America after the formation of the Panamanian land bridge about 3 million years ago (Ma). A time-calibrated phylogeny was estimated by a Bayesian relaxed-clock method based on the assumption that the most recent common ancestor of E. oligarthrus and E. vogeli existed during the late Pliocene (3.0 Ma). The results suggest that a clade of Taenia including human-pathogenic species diversified primarily in the late Miocene (11.2 Ma), whereas Echinococcus started to diversify later, in the end of the Miocene (5.8 Ma). Close genetic relationships among the members of Echinococcus imply that the genus is a young group in which speciation and global radiation occurred rapidly. Copyright © 2011 Elsevier Inc. All rights reserved.
Characterization and analysis of a de novo transcriptome from the pygmy grasshopper Tetrix japonica.

Science.gov (United States)

Qiu, Zhongying; Liu, Fei; Lu, Huimeng; Huang, Yuan

2017-05-01

The pygmy grasshopper Tetrix japonica is a common insect distributed throughout the world, and it has the potential for use in studies of body colour polymorphism, genomics and the biology of Tetrigoidea (Insecta: Orthoptera). However, limited biological information is available for this insect. Here, we conducted a de novo transcriptome study of adult and larval T. japonica to provide a better understanding of its gene expression and develop genomic resources for future work. We sequenced and explored the characteristics of the de novo transcriptome of T. japonica using Illumina HiSeq 2000 platform. A total of 107 608 206 paired-end clean reads were assembled into 61 141 unigenes using the trinity software; the mean unigene size was 771 bp, and the N50 length was 1238 bp. A total of 29 225 unigenes were functionally annotated to the NCBI nonredundant protein sequences (Nr), NCBI nonredundant nucleotide sequences (Nt), a manually annotated and reviewed protein sequence database (Swiss-Prot), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. A large number of putative genes that are potentially involved in pigment pathways, juvenile hormone (JH) metabolism and signalling pathways were identified in the T. japonica transcriptome. Additionally, 165 769 and 156 796 putative single nucleotide polymorphisms occurred in the adult and larvae transcriptomes, respectively, and a total of 3162 simple sequence repeats were detected in this assembly. This comprehensive transcriptomic data for T. japonica will provide a usable resource for gene predictions, signalling pathway investigations and molecular marker development for this species and other pygmy grasshoppers. © 2016 John Wiley & Sons Ltd.
Systematic screening for mutations in the promoter and the coding region of the 5-HT{sub 1A} gene

Energy Technology Data Exchange (ETDEWEB)

Erdmann, J.; Shimron-Abarbanell, D.; Cichon, S. [Univ. of Bonn (Germany)] [and others

1995-10-09

In the present study we sought to identify genetic variation in the 5-HT{sub 1A} receptor gene which through alteration of protein function or level of expression might contribute to the genetic predisposition to neuropsychiatric diseases. Genomic DNA samples from 159 unrelated subjects (including 45 schizophrenic, 46 bipolar affective, and 43 patients with Tourette`s syndrome, as well as 25 healthy controls) were investigated by single-strand conformation analysis. Overlapping PCR (polymerase chain reaction) fragments covered the whole coding sequence as well as the 5{prime} untranslated region of the 5-HT{sub 1A} gene. The region upstream to the coding sequence we investigated contains a functional promoter. We found two rare nucleotide sequence variants. Both mutations are located in the coding region of the gene: a coding mutation (A{yields}G) in nucleotide position 82 which leads to an amino acid exchange (Ile{yields}Val) in position 28 of the receptor protein and a silent mutation (C{yields}T) in nucleotide position 549. The occurrence of the Ile-28-Val substitution was studied in an extended sample of patients (n = 352) and controls (n = 210) but was found in similar frequencies in all groups. Thus, this mutation is unlikely to play a significant role in the genetic predisposition to the diseases investigated. In conclusion, our study does not provide evidence that the 5-HT{sub 1A} gene plays either a major or a minor role in the genetic predisposition to schizophrenia, bipolar affective disorder, or Tourette`s syndrome. 29 refs., 4 figs., 1 tab.
Get phases from arsenic anomalous scattering: de novo SAD phasing of two protein structures crystallized in cacodylate buffer.

Directory of Open Access Journals (Sweden)

Xiang Liu

Full Text Available The crystal structures of two proteins, a putative pyrazinamidase/nicotinamidase from the dental pathogen Streptococcus mutans (SmPncA and the human caspase-6 (Casp6, were solved by de novo arsenic single-wavelength anomalous diffraction (As-SAD phasing method. Arsenic (As, an uncommonly used element in SAD phasing, was covalently introduced into proteins by cacodylic acid, the buffering agent in the crystallization reservoirs. In SmPncA, the only cysteine was bound to dimethylarsinoyl, which is a pentavalent arsenic group (As (V. This arsenic atom and a protein-bound zinc atom both generated anomalous signals. The predominant contribution, however, was from the As anomalous signals, which were sufficient to phase the SmPncA structure alone. In Casp6, four cysteines were found to bind cacodyl, a trivalent arsenic group (As (III, in the presence of the reducing agent, dithiothreitol (DTT, and arsenic atoms were the only anomalous scatterers for SAD phasing. Analyses and discussion of these two As-SAD phasing examples and comparison of As with other traditional heavy atoms that generate anomalous signals, together with a few arsenic-based de novo phasing cases reported previously strongly suggest that As is an ideal anomalous scatterer for SAD phasing in protein crystallography.

Sunflower (Helianthus annuus) fatty acid synthase complex: β-hydroxyacyl-[acyl carrier protein] dehydratase genes.

Science.gov (United States)

González-Thuillier, Irene; Venegas-Calerón, Mónica; Sánchez, Rosario; Garcés, Rafael; von Wettstein-Knowles, Penny; Martínez-Force, Enrique

2016-02-01

Two sunflower hydroxyacyl-[acyl carrier protein] dehydratases evolved into two different isoenzymes showing distinctive expression levels and kinetics' efficiencies. β-Hydroxyacyl-[acyl carrier protein (ACP)]-dehydratase (HAD) is a component of the type II fatty acid synthase complex involved in 'de novo' fatty acid biosynthesis in plants. This complex, formed by four intraplastidial proteins, is responsible for the sequential condensation of two-carbon units, leading to 16- and 18-C acyl-ACP. HAD dehydrates 3-hydroxyacyl-ACP generating trans-2-enoyl-ACP. With the aim of a further understanding of fatty acid biosynthesis in sunflower (Helianthus annuus) seeds, two β-hydroxyacyl-[ACP] dehydratase genes have been cloned from developing seeds, HaHAD1 (GenBank HM044767) and HaHAD2 (GenBank GU595454). Genomic DNA gel blot analyses suggest that both are single copy genes. Differences in their expression patterns across plant tissues were detected. Higher levels of HaHAD2 in the initial stages of seed development inferred its key role in seed storage fatty acid synthesis. That HaHAD1 expression levels remained constant across most tissues suggest a housekeeping function. Heterologous expression of these genes in E. coli confirmed both proteins were functional and able to interact with the bacterial complex 'in vivo'. The large increase of saturated fatty acids in cells expressing HaHAD1 and HaHAD2 supports the idea that these HAD genes are closely related to the E. coli FabZ gene. The proposed three-dimensional models of HaHAD1 and HaHAD2 revealed differences at the entrance to the catalytic tunnel attributable to Phe166/Val1159, respectively. HaHAD1 F166V was generated to study the function of this residue. The 'in vitro' enzymatic characterization of the three HAD proteins demonstrated all were active, with the mutant having intermediate K m and V max values to the wild-type proteins.
Long non-coding RNAs and mRNAs profiling during spleen development in pig.

Science.gov (United States)

Che, Tiandong; Li, Diyan; Jin, Long; Fu, Yuhua; Liu, Yingkai; Liu, Pengliang; Wang, Yixin; Tang, Qianzi; Ma, Jideng; Wang, Xun; Jiang, Anan; Li, Xuewei; Li, Mingzhou

2018-01-01

Genome-wide transcriptomic studies in humans and mice have become extensive and mature. However, a comprehensive and systematic understanding of protein-coding genes and long non-coding RNAs (lncRNAs) expressed during pig spleen development has not been achieved. LncRNAs are known to participate in regulatory networks for an array of biological processes. Here, we constructed 18 RNA libraries from developing fetal pig spleen (55 days before birth), postnatal pig spleens (0, 30, 180 days and 2 years after birth), and the samples from the 2-year-old Wild Boar. A total of 15,040 lncRNA transcripts were identified among these samples. We found that the temporal expression pattern of lncRNAs was more restricted than observed for protein-coding genes. Time-series analysis showed two large modules for protein-coding genes and lncRNAs. The up-regulated module was enriched for genes related to immune and inflammatory function, while the down-regulated module was enriched for cell proliferation processes such as cell division and DNA replication. Co-expression networks indicated the functional relatedness between protein-coding genes and lncRNAs, which were enriched for similar functions over the series of time points examined. We identified numerous differentially expressed protein-coding genes and lncRNAs in all five developmental stages. Notably, ceruloplasmin precursor (CP), a protein-coding gene participating in antioxidant and iron transport processes, was differentially expressed in all stages. This study provides the first catalog of the developing pig spleen, and contributes to a fuller understanding of the molecular mechanisms underpinning mammalian spleen development.
HBV core protein allosteric modulators differentially alter cccDNA biosynthesis from de novo infection and intracellular amplification pathways.

Science.gov (United States)

Guo, Fang; Zhao, Qiong; Sheraz, Muhammad; Cheng, Junjun; Qi, Yonghe; Su, Qing; Cuconati, Andrea; Wei, Lai; Du, Yanming; Li, Wenhui; Chang, Jinhong; Guo, Ju-Tao

2017-09-01

Hepatitis B virus (HBV) core protein assembles viral pre-genomic (pg) RNA and DNA polymerase into nucleocapsids for reverse transcriptional DNA replication to take place. Several chemotypes of small molecules, including heteroaryldihydropyrimidines (HAPs) and sulfamoylbenzamides (SBAs), have been discovered to allosterically modulate core protein structure and consequentially alter the kinetics and pathway of core protein assembly, resulting in formation of irregularly-shaped core protein aggregates or "empty" capsids devoid of pre-genomic RNA and viral DNA polymerase. Interestingly, in addition to inhibiting nucleocapsid assembly and subsequent viral genome replication, we have now demonstrated that HAPs and SBAs differentially modulate the biosynthesis of covalently closed circular (ccc) DNA from de novo infection and intracellular amplification pathways by inducing disassembly of nucleocapsids derived from virions as well as double-stranded DNA-containing progeny nucleocapsids in the cytoplasm. Specifically, the mistimed cuing of nucleocapsid uncoating prevents cccDNA formation during de novo infection of hepatocytes, while transiently accelerating cccDNA synthesis from cytoplasmic progeny nucleocapsids. Our studies indicate that elongation of positive-stranded DNA induces structural changes of nucleocapsids, which confers ability of mature nucleocapsids to bind CpAMs and triggers its disassembly. Understanding the molecular mechanism underlying the dual effects of the core protein allosteric modulators on nucleocapsid assembly and disassembly will facilitate the discovery of novel core protein-targeting antiviral agents that can more efficiently suppress cccDNA synthesis and cure chronic hepatitis B.
HBV core protein allosteric modulators differentially alter cccDNA biosynthesis from de novo infection and intracellular amplification pathways.

Directory of Open Access Journals (Sweden)

Fang Guo

2017-09-01

Full Text Available Hepatitis B virus (HBV core protein assembles viral pre-genomic (pg RNA and DNA polymerase into nucleocapsids for reverse transcriptional DNA replication to take place. Several chemotypes of small molecules, including heteroaryldihydropyrimidines (HAPs and sulfamoylbenzamides (SBAs, have been discovered to allosterically modulate core protein structure and consequentially alter the kinetics and pathway of core protein assembly, resulting in formation of irregularly-shaped core protein aggregates or "empty" capsids devoid of pre-genomic RNA and viral DNA polymerase. Interestingly, in addition to inhibiting nucleocapsid assembly and subsequent viral genome replication, we have now demonstrated that HAPs and SBAs differentially modulate the biosynthesis of covalently closed circular (ccc DNA from de novo infection and intracellular amplification pathways by inducing disassembly of nucleocapsids derived from virions as well as double-stranded DNA-containing progeny nucleocapsids in the cytoplasm. Specifically, the mistimed cuing of nucleocapsid uncoating prevents cccDNA formation during de novo infection of hepatocytes, while transiently accelerating cccDNA synthesis from cytoplasmic progeny nucleocapsids. Our studies indicate that elongation of positive-stranded DNA induces structural changes of nucleocapsids, which confers ability of mature nucleocapsids to bind CpAMs and triggers its disassembly. Understanding the molecular mechanism underlying the dual effects of the core protein allosteric modulators on nucleocapsid assembly and disassembly will facilitate the discovery of novel core protein-targeting antiviral agents that can more efficiently suppress cccDNA synthesis and cure chronic hepatitis B.
HBV core protein allosteric modulators differentially alter cccDNA biosynthesis from de novo infection and intracellular amplification pathways

Science.gov (United States)

Guo, Fang; Zhao, Qiong; Cheng, Junjun; Qi, Yonghe; Su, Qing; Wei, Lai; Li, Wenhui; Chang, Jinhong

2017-01-01

Hepatitis B virus (HBV) core protein assembles viral pre-genomic (pg) RNA and DNA polymerase into nucleocapsids for reverse transcriptional DNA replication to take place. Several chemotypes of small molecules, including heteroaryldihydropyrimidines (HAPs) and sulfamoylbenzamides (SBAs), have been discovered to allosterically modulate core protein structure and consequentially alter the kinetics and pathway of core protein assembly, resulting in formation of irregularly-shaped core protein aggregates or “empty” capsids devoid of pre-genomic RNA and viral DNA polymerase. Interestingly, in addition to inhibiting nucleocapsid assembly and subsequent viral genome replication, we have now demonstrated that HAPs and SBAs differentially modulate the biosynthesis of covalently closed circular (ccc) DNA from de novo infection and intracellular amplification pathways by inducing disassembly of nucleocapsids derived from virions as well as double-stranded DNA-containing progeny nucleocapsids in the cytoplasm. Specifically, the mistimed cuing of nucleocapsid uncoating prevents cccDNA formation during de novo infection of hepatocytes, while transiently accelerating cccDNA synthesis from cytoplasmic progeny nucleocapsids. Our studies indicate that elongation of positive-stranded DNA induces structural changes of nucleocapsids, which confers ability of mature nucleocapsids to bind CpAMs and triggers its disassembly. Understanding the molecular mechanism underlying the dual effects of the core protein allosteric modulators on nucleocapsid assembly and disassembly will facilitate the discovery of novel core protein-targeting antiviral agents that can more efficiently suppress cccDNA synthesis and cure chronic hepatitis B. PMID:28945802
XGC developments for a more efficient XGC-GENE code coupling

Science.gov (United States)

Dominski, Julien; Hager, Robert; Ku, Seung-Hoe; Chang, Cs

2017-10-01

In the Exascale Computing Program, the High-Fidelity Whole Device Modeling project initially aims at delivering a tightly-coupled simulation of plasma neoclassical and turbulence dynamics from the core to the edge of the tokamak. To permit such simulations, the gyrokinetic codes GENE and XGC will be coupled together. Numerical efforts are made to improve the numerical schemes agreement in the coupling region. One of the difficulties of coupling those codes together is the incompatibility of their grids. GENE is a continuum grid-based code and XGC is a Particle-In-Cell code using unstructured triangular mesh. A field-aligned filter is thus implemented in XGC. Even if XGC originally had an approximately field-following mesh, this field-aligned filter permits to have a perturbation discretization closer to the one solved in the field-aligned code GENE. Additionally, new XGC gyro-averaging matrices are implemented on a velocity grid adapted to the plasma properties, thus ensuring same accuracy from the core to the edge regions.
Amino acid codes in mitochondria as possible clues to primitive codes

Science.gov (United States)

Jukes, T. H.

1981-01-01

Differences between mitochondrial codes and the universal code indicate that an evolutionary simplification has taken place, rather than a return to a more primitive code. However, these differences make it evident that the universal code is not the only code possible, and therefore earlier codes may have differed markedly from the previous code. The present universal code is probably a 'frozen accident.' The change in CUN codons from leucine to threonine (Neurospora vs. yeast mitochondria) indicates that neutral or near-neutral changes occurred in the corresponding proteins when this code change took place, caused presumably by a mutation in a tRNA gene.
De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response.

Directory of Open Access Journals (Sweden)

Duan Gui

Full Text Available BACKGROUND: The Indo-Pacific humpback dolphin (Sousa chinensis, a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. PRINCIPAL FINDINGS: We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10(-5, respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits. CONCLUSION: This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.
Vertebrate gene predictions and the problem of large genes

DEFF Research Database (Denmark)

Wang, Jun; Li, ShengTing; Zhang, Yong

2003-01-01

To find unknown protein-coding genes, annotation pipelines use a combination of ab initio gene prediction and similarity to experimentally confirmed genes or proteins. Here, we show that although the ab initio predictions have an intrinsically high false-positive rate, they also have a consistent...
Cloning and expression of the coding regions of the heat shock proteins HSP10 and HSP16 from Piscirickettsia salmonis

Directory of Open Access Journals (Sweden)

VIVIAN WILHELM

2003-01-01

Full Text Available The genes encoding the heat shock proteins HSP10 and HSP16 of the salmon pathogen Piscirickettsia salmonis have been isolated and sequenced. The HSP10 coding sequence is located in an open reading frame of 291 base pairs encoding 96 aminoacids. The HSP16 coding region was isolated as a 471 base pair fragment encoding a protein of 156 aminoacids. The deduced aminoacid sequences of both proteins show a significant homology to the respective protein from other prokaryotic organisms. Both proteins were expressed in E. coli as fusion proteins with thioredoxin and purified by chromatography on Ni-column. A rabbit serum against P. salmonis total proteins reacts with the recombinant HSP10 and HSP16 proteins. Similar reactivity was determined by ELISA using serum from salmon infected with P. salmonis. The possibility of formulating a vaccine containing these two proteins is discussed
Improvement of heterologous protein production in Aspergillus oryzae by RNA interference with alpha-amylase genes.

Science.gov (United States)

Nemoto, Takashi; Maruyama, Jun-ichi; Kitamoto, Katsuhiko

2009-11-01

Aspergillus oryzae RIB40 has three alpha-amylase genes (amyA, amyB, and amyC), and secretes alpha-amylase abundantly. However, large amounts of endogenous secretory proteins such as alpha-amylase can compete with heterologous protein in the secretory pathway and decrease its production yields. In this study, we examined the effects of suppression of alpha-amylase on heterologous protein production in A. oryzae, using the bovine chymosin (CHY) as a reporter heterologous protein. The three alpha-amylase genes in A. oryzae have nearly identical DNA sequences from those promoters to the coding regions. Hence we performed silencing of alpha-amylase genes by RNA interference (RNAi) in the A. oryzae CHY producing strain. The silenced strains exhibited a reduction in alpha-amylase activity and an increase in CHY production in the culture medium. This result suggests that suppression of alpha-amylase is effective in heterologous protein production in A. oryzae.
Acquisition, consolidation, reconsolidation, and extinction of eyelid conditioning responses require de novo protein synthesis.

Science.gov (United States)

Inda, Mari Carmen; Delgado-García, José María; Carrión, Angel Manuel

2005-02-23

Memory, as measured by changes in an animal's behavior some time after learning, is a reflection of many processes. Here, using a trace paradigm, in mice we show that de novo protein synthesis is required for acquisition, consolidation, reconsolidation, and extinction of classically conditioned eyelid responses. Two critical periods of protein synthesis have been found: the first, during training, the blocking of which impaired acquisition; and the second, lasting the first 4 h after training, the blocking of which impaired consolidation. The process of reconsolidation was sensitive to protein synthesis inhibition if anisomycin was injected before or just after the reactivation session. Furthermore, extinction was also dependent on protein synthesis, following the same temporal course as that followed during acquisition and consolidation. This last fact reinforces the idea that extinction is an active learning process rather than a passive event of forgetting. Together, these findings demonstrate that all of the different stages of memory formation involved in the classical conditioning of eyelid responses are dependent on protein synthesis.
Isolation and expression of the genes coding for the membrane bound transglycosylase B (MltB and the transferrin binding protein B (TbpB of the salmon pathogen Piscirickettsia salmonis

Directory of Open Access Journals (Sweden)

VIVIAN WILHELM

2004-01-01

Full Text Available We have isolated and sequenced the genes encoding the membrane bound transglycosylase B (MltB and the transferring binding protein B (TbpB of the salmon pathogen Piscirickettsia salmonis. The results of the sequence revealed two open reading frames that encode proteins with calculated molecular weights of 38,830 and 85,140. The deduced aminoacid sequences of both proteins show a significant homology to the respective protein from phylogenetically related microorganisms. Partial sequences coding the amino and carboxyl regions of MltB and a sequence of 761 base pairs encoding the amino region of TbpB have been expressed in E. coli. The strong humoral response elicited by these proteins in mouse confirmed the immunogenic properties of the recombinant proteins. A similar response was elicited by both proteins when injected intraperitoneally in Atlantic salmon. The present data indicates that these proteins are good candidates to be used in formulations to study the protective immunity of salmon to infection by P. salmonis.
Extreme-Scale De Novo Genome Assembly

Energy Technology Data Exchange (ETDEWEB)

Georganas, Evangelos [Intel Corporation, Santa Clara, CA (United States); Hofmeyr, Steven [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Egan, Rob [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Rokhsar, Daniel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Yelick, Katherine [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.

2017-09-26

De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and the large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.
Customizable de novo design strategies for DOCK: Application to HIVgp41 and other therapeutic targets.

Science.gov (United States)

Allen, William J; Fochtman, Brian C; Balius, Trent E; Rizzo, Robert C

2017-11-15

De novo design can be used to explore vast areas of chemical space in computational lead discovery. As a complement to virtual screening, from-scratch construction of molecules is not limited to compounds in pre-existing vendor catalogs. Here, we present an iterative fragment growth method, integrated into the program DOCK, in which new molecules are built using rules for allowable connections based on known molecules. The method leverages DOCK's advanced scoring and pruning approaches and users can define very specific criteria in terms of properties or features to customize growth toward a particular region of chemical space. The code was validated using three increasingly difficult classes of calculations: (1) Rebuilding known X-ray ligands taken from 663 complexes using only their component parts (focused libraries), (2) construction of new ligands in 57 drug target sites using a library derived from ∼13M drug-like compounds (generic libraries), and (3) application to a challenging protein-protein interface on the viral drug target HIVgp41. The computational testing confirms that the de novo DOCK routines are robust and working as envisioned, and the compelling results highlight the potential utility for designing new molecules against a wide variety of important protein targets. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Urinary exosomes: a novel means to non-invasively assess changes in renal gene and protein expression.

Directory of Open Access Journals (Sweden)

Silvia Spanu

Full Text Available BACKGROUND: In clinical practice, there is a lack of markers for the non-invasive diagnosis and follow-up of kidney disease. Exosomes are membrane vesicles, which are secreted from their cells of origin into surrounding body fluids and contain proteins and mRNA which are protected from digestive enzymes by a cell membrane. METHODS: Toxic podocyte damage was induced by puromycin aminonucleoside in rats (PAN. Urinary exosomes were isolated by ultracentrifugation at different time points during the disease. Exosomal mRNA was isolated, amplified, and the mRNA species were globally assessed by gene array analysis. Tissue-specific gene and protein expression was assessed by RT-qPCR analysis and immunohistochemistry. RESULTS: Gene array analysis of mRNA isolated from urinary exosomes revealed cystatin C mRNA as one of the most highly regulated genes. Its gene expression increased 7.5-fold by day 5 and remained high with a 1.9-fold increase until day 10. This was paralleled by a 2-fold increase in cystatin C mRNA expression in the renal cortex. Protein expression in the kidneys also dramatically increased with de novo expression of cystatin C in glomerular podocytes in parts of the proximal tubule and the renal medulla. Urinary excretion of cystatin C increased approximately 2-fold. CONCLUSION: In this proof-of-concept study, we could demonstrate that changes in urinary exosomal cystatin C mRNA expression are representative of changes in renal mRNA and protein expression. Because cells lining the urinary tract produce urinary exosomal cystatin C mRNA, it might be a more specific marker of renal damage than glomerular-filtered free cystatin C.
The Drosophila gene CG9918 codes for a pyrokinin-1 receptor

DEFF Research Database (Denmark)

Cazzamali, Giuseppe; Torp, Malene; Hauser, Frank

2005-01-01

The database from the Drosophila Genome Project contains a gene, CG9918, annotated to code for a G protein-coupled receptor. We cloned the cDNA of this gene and functionally expressed it in Chinese hamster ovary cells. We tested a library of about 25 Drosophila and other insect neuropeptides......, and seven insect biogenic amines on the expressed receptor and found that it was activated by low concentrations of the Drosophila neuropeptide, pyrokinin-1 (TGPSASSGLWFGPRLamide; EC50, 5 x 10(-8) M). The receptor was also activated by other Drosophila neuropeptides, terminating with the sequence PRLamide...... (Hug-gamma, ecdysis-triggering-hormone-1, pyrokinin-2), but in these cases about six to eight times higher concentrations were needed. The receptor was not activated by Drosophila neuropeptides, containing a C-terminal PRIamide sequence (such as ecdysis-triggering-hormone-2), or PRVamide (such as capa...
Functional intersection of ATM and DNA-dependent protein kinase catalytic subunit in coding end joining during V(D)J recombination

DEFF Research Database (Denmark)

Lee, Baeck-Seung; Gapud, Eric J; Zhang, Shichuan

2013-01-01

V(D)J recombination is initiated by the RAG endonuclease, which introduces DNA double-strand breaks (DSBs) at the border between two recombining gene segments, generating two hairpin-sealed coding ends and two blunt signal ends. ATM and DNA-dependent protein kinase catalytic subunit (DNA-PKcs) ar......V(D)J recombination is initiated by the RAG endonuclease, which introduces DNA double-strand breaks (DSBs) at the border between two recombining gene segments, generating two hairpin-sealed coding ends and two blunt signal ends. ATM and DNA-dependent protein kinase catalytic subunit (DNA......-PKcs) are serine-threonine kinases that orchestrate the cellular responses to DNA DSBs. During V(D)J recombination, ATM and DNA-PKcs have unique functions in the repair of coding DNA ends. ATM deficiency leads to instability of postcleavage complexes and the loss of coding ends from these complexes. DNA...... when ATM is present and its kinase activity is intact. The ability of ATM to compensate for DNA-PKcs kinase activity depends on the integrity of three threonines in DNA-PKcs that are phosphorylation targets of ATM, suggesting that ATM can modulate DNA-PKcs activity through direct phosphorylation of DNA...
Gene expression patterns regulating embryogenesis based on the integrated de novo transcriptome assembly of the Japanese flounder.

Science.gov (United States)

Fu, Yuanshuai; Jia, Liang; Shi, Zhiyi; Zhang, Junling; Li, Wenjuan

2017-06-01

The Japanese flounder (Paralichthys olivaceus) is one of the most important commercial and biological marine fishes. However, the molecular biology involved during embryogenesis and early development of the Japanese flounder remains largely unknown due to a lack of genomic resources. A comprehensive and integrated transcriptome is necessary to study the molecular mechanisms of early development and to allow for the detailed characterization of gene expression patterns during embryogenesis; this approach is critical to understanding the processes that occur prior to mesectoderm formation during early embryonic development. In this study, more than 117.8 million 100bp PE reads were generated from pooled RNA extracted from unfertilized eggs to 41dph (days post-hatching) embryos and were sequenced using Illumina pair-end sequencing technology. In total, 121,513 transcripts (≥200bp) were obtained using de novo assembly. A sequence similarity search indicated that 52,338 transcripts show significant similarity to 22,462 known proteins from the NCBI non-redundant database and the Swiss-Prot protein database and were annotated using Blast2GO. GO terms were assigned to 44,627 transcripts with 12,006 functional terms, and 10,024 transcripts were assigned to 133 KEGG pathways. Furthermore, gene expression differences between the unfertilized egg and the gastrula embryo were analysed using Illumina RNA-Seq with single-read sequencing technology, and 24,837 differentially and specifically expressed transcripts were identified and included 5,286 annotated transcripts and 19,569 non-annotated transcripts. All of the expressed transcripts in the unfertilized egg and gastrula embryo were further classified as maternal, zygotic, or maternal-zygotic transcripts, which may help us to understand the roles of these transcripts during the embryonic development of the Japanese flounder. Thus, the results will contribute to an improved understanding of the gene expression patterns and
In-depth comparative analysis of malaria parasite genomes reveals protein-coding genes linked to human disease in Plasmodium falciparum genome.

Science.gov (United States)

Liu, Xuewu; Wang, Yuanyuan; Liang, Jiao; Wang, Luojun; Qin, Na; Zhao, Ya; Zhao, Gang

2018-05-02

Plasmodium falciparum is the most virulent malaria parasite capable of parasitizing human erythrocytes. The identification of genes related to this capability can enhance our understanding of the molecular mechanisms underlying human malaria and lead to the development of new therapeutic strategies for malaria control. With the availability of several malaria parasite genome sequences, performing computational analysis is now a practical strategy to identify genes contributing to this disease. Here, we developed and used a virtual genome method to assign 33,314 genes from three human malaria parasites, namely, P. falciparum, P. knowlesi and P. vivax, and three rodent malaria parasites, namely, P. berghei, P. chabaudi and P. yoelii, to 4605 clusters. Each cluster consisted of genes whose protein sequences were significantly similar and was considered as a virtual gene. Comparing the enriched values of all clusters in human malaria parasites with those in rodent malaria parasites revealed 115 P. falciparum genes putatively responsible for parasitizing human erythrocytes. These genes are mainly located in the chromosome internal regions and participate in many biological processes, including membrane protein trafficking and thiamine biosynthesis. Meanwhile, 289 P. berghei genes were included in the rodent parasite-enriched clusters. Most are located in subtelomeric regions and encode erythrocyte surface proteins. Comparing cluster values in P. falciparum with those in P. vivax and P. knowlesi revealed 493 candidate genes linked to virulence. Some of them encode proteins present on the erythrocyte surface and participate in cytoadhesion, virulence factor trafficking, or erythrocyte invasion, but many genes with unknown function were also identified. Cerebral malaria is characterized by accumulation of infected erythrocytes at trophozoite stage in brain microvascular. To discover cerebral malaria-related genes, fast Fourier transformation (FFT) was introduced to extract

Origins of gene, genetic code, protein and life

Indian Academy of Sciences (India)

We have further presented the [GADV]-protein world hypothesis of the origin of life as well as a hypothesis of protein production, suggesting that proteins were originally produced by random peptide formation of amino acids restricted in specific amino acid compositions termed as GNC-, SNS- and GC-NSF(a)-0th order ...
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

Science.gov (United States)

Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

2010-05-07

Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Overexpression Analysis of emv2 gene coding for Late Embryogenesis Abundant Protein from Vigna radiata (Wilczek

Directory of Open Access Journals (Sweden)

Rajesh S.

2008-10-01

Full Text Available Late embryogenesis abundant (LEA proteins are speculated to protect against water stress deficit in plants. An over expression system for mungbean late embryogenesis abundant protein, emv2 was constructed in a pET29a vector, designated pET-emv2 which is responsible for higher expression under the transcriptional/translational control of T7/lac promoter incorporated in the Escherichia coli BL21 (DE3.Induction protocol was optimized for pET recombinants harboring the target gene. Overexpressed EMV2 protein was purified to homogeneity and the protein profile monitored by SDS-PAGE.
Arginine de novo and nitric oxide production in disease states

OpenAIRE

Luiking, Yvette C.; Ten Have, Gabriella A. M.; Wolfe, Robert R.; Deutz, Nicolaas E. P.

2012-01-01

Arginine is derived from dietary protein intake, body protein breakdown, or endogenous de novo arginine production. The latter may be linked to the availability of citrulline, which is the immediate precursor of arginine and limiting factor for de novo arginine production. Arginine metabolism is highly compartmentalized due to the expression of the enzymes involved in arginine metabolism in various organs. A small fraction of arginine enters the NO synthase (NOS) pathway. Tetrahydrobiopterin ...
De novo transcriptome assembly and quantification reveal differentially expressed genes between soft-seed and hard-seed pomegranate (Punica granatum L..

Directory of Open Access Journals (Sweden)

Hui Xue

Full Text Available Pomegranate (Punica granatum L. belongs to Punicaceae, and is valued for its social, ecological, economic, and aesthetic values, as well as more recently for its health benefits. The 'Tunisia' variety has softer seeds and big arils that are easily swallowed. It is a widely popular fruit; however, the molecular mechanisms of the formation of hard and soft seeds is not yet clear. We conducted a de novo assembly of the seed transcriptome in P. granatum L. and revealed differential gene expression between the soft-seed and hard-seed pomegranate varieties. A total of 35.1 Gb of data were acquired in this study, including 280,881,106 raw reads. Additionally, de novo transcriptome assembly generated 132,287 transcripts and 105,743 representative unigenes; approximately 13,805 unigenes (37.7% were longer than 1,000 bp. Using bioinformatics annotation libraries, a total of 76,806 unigenes were annotated and, among the high-quality reads, 72.63% had at least one significant match to an existing gene model. Gene expression and differentially expressed genes were analyzed. The seed formation of the two pomegranate cultivars involves lignin biosynthesis and metabolism, including some genes encoding laccase and peroxidase, WRKY, MYB, and NAC transcription factors. In the hard-seed pomegranate, lignin-related genes and cellulose synthesis-related genes were highly expressed; in soft-seed pomegranates, expression of genes related to flavonoids and programmed cell death was slightly higher. We validated selection of the identified genes using qRT-PCR. This is the first transcriptome analysis of P. granatum L. This transcription sequencing greatly enriched the pomegranate molecular database, and the high-quality SSRs generated in this study will aid the gene cloning from pomegranate in the future. It provides important insights into the molecular mechanisms underlying the formation of soft seeds in pomegranate.
De novo transcriptome assembly and quantification reveal differentially expressed genes between soft-seed and hard-seed pomegranate (Punica granatum L.).

Science.gov (United States)

Xue, Hui; Cao, Shangyin; Li, Haoxian; Zhang, Jie; Niu, Juan; Chen, Lina; Zhang, Fuhong; Zhao, Diguang

2017-01-01

Pomegranate (Punica granatum L.) belongs to Punicaceae, and is valued for its social, ecological, economic, and aesthetic values, as well as more recently for its health benefits. The 'Tunisia' variety has softer seeds and big arils that are easily swallowed. It is a widely popular fruit; however, the molecular mechanisms of the formation of hard and soft seeds is not yet clear. We conducted a de novo assembly of the seed transcriptome in P. granatum L. and revealed differential gene expression between the soft-seed and hard-seed pomegranate varieties. A total of 35.1 Gb of data were acquired in this study, including 280,881,106 raw reads. Additionally, de novo transcriptome assembly generated 132,287 transcripts and 105,743 representative unigenes; approximately 13,805 unigenes (37.7%) were longer than 1,000 bp. Using bioinformatics annotation libraries, a total of 76,806 unigenes were annotated and, among the high-quality reads, 72.63% had at least one significant match to an existing gene model. Gene expression and differentially expressed genes were analyzed. The seed formation of the two pomegranate cultivars involves lignin biosynthesis and metabolism, including some genes encoding laccase and peroxidase, WRKY, MYB, and NAC transcription factors. In the hard-seed pomegranate, lignin-related genes and cellulose synthesis-related genes were highly expressed; in soft-seed pomegranates, expression of genes related to flavonoids and programmed cell death was slightly higher. We validated selection of the identified genes using qRT-PCR. This is the first transcriptome analysis of P. granatum L. This transcription sequencing greatly enriched the pomegranate molecular database, and the high-quality SSRs generated in this study will aid the gene cloning from pomegranate in the future. It provides important insights into the molecular mechanisms underlying the formation of soft seeds in pomegranate.
Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA.

Directory of Open Access Journals (Sweden)

Kumar Parijat Tripathi

Full Text Available RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool, QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery tools. It offers a report on statistical analysis of functional and Gene Ontology (GO annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA by ab initio methods helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is
Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

Directory of Open Access Journals (Sweden)

Yubo Hou

Full Text Available The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10-transformed protein-coding gene number (Y' versus log(10-transformed genome size (X', genome size in kbp were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p0.91. Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1% compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%. The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6 kbp contains 38,188 protein-coding (40,086 total genes and the largest (245x10(6 kbp 87,688 protein-coding (92,013 total genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.
De novo transcriptome assembly of two different peach cultivars grown in Korea

Directory of Open Access Journals (Sweden)

Yeonhwa Jo

2015-12-01

Full Text Available Peach (Prunus persica is one of the most popular stone fruits worldwide. Next generation sequencing (NGS has facilitated genome and transcriptome analyses of several stone fruit trees. In this study, we conducted de novo transcriptome analyses of two peach cultivars grown in Korea. Leaves of two cultivars, referred to as Jangtaek and Mibaek, were harvested and used for library preparation. The two prepared libraries were paired-end sequenced by the HiSeq2000 system. We obtained 8.14 GB and 9.62 GB sequence data from Jangtaek and Mibaek (NCBI accession numbers: SRS1056585 and SRS1056587, respectively. The Trinity program was used to assemble two transcriptomes de novo, resulting in 110,477 (Jangtaek and 136,196 (Mibaek transcripts. TransDecoder identified possible coding regions in assembled transcripts. The identified proteins were subjected to BLASTP search against NCBI's non-redundant database for functional annotation. This study provides transcriptome data for two peach cultivars, which might be useful for genetic marker development and comparative transcriptome analyses.
De novo transcriptomic analysis of an oleaginous microalga: pathway description and gene discovery for production of next-generation biofuels.

Directory of Open Access Journals (Sweden)

LingLin Wan

Full Text Available Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production.We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem.Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.
High GC content causes orphan proteins to be intrinsically disordered.

Directory of Open Access Journals (Sweden)

Walter Basile

2017-03-01

Full Text Available De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC and Drosophila (high GC. GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.
Bistability in self-activating genes regulated by non-coding RNAs

International Nuclear Information System (INIS)

Miro-Bueno, Jesus

2015-01-01

Non-coding RNA molecules are able to regulate gene expression and play an essential role in cells. On the other hand, bistability is an important behaviour of genetic networks. Here, we propose and study an ODE model in order to show how non-coding RNA can produce bistability in a simple way. The model comprises a single gene with positive feedback that is repressed by non-coding RNA molecules. We show how the values of all the reaction rates involved in the model are able to control the transitions between the high and low states. This new model can be interesting to clarify the role of non-coding RNA molecules in genetic networks. As well, these results can be interesting in synthetic biology for developing new genetic memories and biomolecular devices based on non-coding RNAs
Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome.

Directory of Open Access Journals (Sweden)

Loren A Honaas

Full Text Available Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS and NextGENe. Controlled analyses of de novo assemblies for Arabidopsis thaliana and Oryza sativa transcriptomes provide new insights into the strengths and limitations of transcriptome assembly strategies. We find that the leading assemblers generate reassuringly accurate assemblies for the majority of transcripts. At the same time, we find a propensity for assemblers to fail to fully assemble highly expressed genes. Surprisingly, the instance of true chimeric assemblies is very low for all assemblers. Normalized libraries are reduced in highly abundant transcripts, but they also lack 1000s of low abundance transcripts. We conclude that the quality of de novo transcriptome assemblies is best assessed through consideration of a combination of metrics: 1 proportion of reads mapping to an assembly 2 recovery of conserved, widely expressed genes, 3 N50 length statistics, and 4 the total number of unigenes. We provide benchmark Illumina transcriptome data and introduce SCERNA, a broadly applicable modular protocol for de novo assembly improvement. Finally, our de novo assembly of the Arabidopsis leaf transcriptome revealed ~20 putative Arabidopsis genes lacking in the current annotation.
Intron-exon organization of the active human protein S gene PS. alpha. and its pseudogene PS. beta. : Duplication and silencing during primate evolution

Energy Technology Data Exchange (ETDEWEB)

Ploos van Amstel, H.; Reitsma, P.H.; van der Logt, C.P.; Bertina, R.M. (University Hospital, Leiden (Netherlands))

1990-08-28

The human protein S locus on chromosome 3 consists of two protein S genes, PS{alpha} and PS{beta}. Here the authors report the cloning and characterization of both genes. Fifteen exons of the PS{alpha} gene were identified that together code for protein S mRNA as derived from the reported protein S cDNAs. Analysis by primer extension of liver protein S mRNA, however, reveals the presence of two mRNA forms that differ in the length of their 5{prime}-noncoding region. Both transcripts contain a 5{prime}-noncoding region longer than found in the protein S cDNAs. The two products may arise from alternative splicing of an additional intron in this region or from the usage of two start sites for transcription. The intron-exon organization of the PS{alpha} gene fully supports the hypothesis that the protein S gene is the product of an evolutional assembling process in which gene modules coding for structural/functional protein units also found in other coagulation proteins have been put upstream of the ancestral gene of a steroid hormone binding protein. The PS{beta} gene is identified as a pseudogene. It contains a large variety of detrimental aberrations, viz., the absence of exon I, a splice site mutation, three stop codons, and a frame shift mutation. Overall the two genes PS{alpha} and PS{beta} show between their exonic sequences 96.5% homology. Southern analysis of primate DNA showed that the duplication of the ancestral protein S gene has occurred after the branching of the orangutan from the African apes. A nonsense mutation that is present in the pseudogene of man also could be identified in one of the two protein S genes of both chimpanzee and gorilla. This implicates that silencing of one of the two protein S genes must have taken place before the divergence of the three African apes.
Molecular cloning and characterization of two β-ketoacyl-acyl carrier protein synthase I genes from Jatropha curcas L.

Science.gov (United States)

Xiong, Wangdan; Wei, Qian; Wu, Pingzhi; Zhang, Sheng; Li, Jun; Chen, Yaping; Li, Meiru; Jiang, Huawu; Wu, Guojiang

2017-07-01

The β-ketoacyl-acyl carrier protein synthase I (KASI) is involved in de novo fatty acid biosynthesis in many organisms. Two putative KASI genes, JcKASI-1 and JcKASI-2, were isolated from Jatropha curcas. The deduced amino acid sequences of JcKASI-1 and JcKASI-2 exhibit around 83.8% and 72.5% sequence identities with AtKASI, respectively, and both contain conserved Cys-His-Lys-His-Phe catalytic active sites. Phylogenetic analysis indicated that JcKASI-2 belongs to a clade with several KASI proteins from dicotyledonous plants. Both JcKASI genes were expressed in multiple tissues, most strongly in filling stage seeds of J. curcas. Additionally, the JcKASI-1 and JcKASI-2 proteins were both localized to the plastids. Expressing JcKASI-1 in the Arabidopsis kasI mutant rescued the mutant's phenotype and restored the fatty acid composition and oil content in seeds to wild-type, but expressing JcKASI-2 in the Arabidopsis kasI mutant resulted in only partial rescue. This implies that JcKASI-1 and JcKASI-2 exhibit partial functional redundancy and KASI genes play a universal role in regulating fatty acid biosynthesis, growth, and development in plants. Copyright © 2017 Elsevier GmbH. All rights reserved.
Metagenome and Metatranscriptome Analyses Using Protein Family Profiles.

Directory of Open Access Journals (Sweden)

Cuncong Zhong

2016-07-01

Full Text Available Analyses of metagenome data (MG and metatranscriptome data (MT are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE genes. Finally, HMM
Sub-grouping of Plasmodium falciparum 3D7 var genes based on sequence analysis of coding and non-coding regions

DEFF Research Database (Denmark)

Lavstsen, Thomas; Salanti, Ali; Jensen, Anja T R

2003-01-01

and organization of the 3D7 PfEMP1 repertoire was investigated on the basis of the complete genome sequence. METHODS: Using two tree-building methods we analysed the coding and non-coding sequences of 3D7 var and rif genes as well as var genes of other parasite strains. RESULTS: var genes can be sub...
Transcriptome sequencing and de novo assembly in arecanut, Areca catechu L elucidates the secondary metabolite pathway genes

Directory of Open Access Journals (Sweden)

Ramaswamy Manimekalai

2018-03-01

Full Text Available Areca catechu L. belongs to the Arecaceae family which comprises many economically important palms. The palm is a source of alkaloids and carotenoids. The lack of ample genetic information in public databases has been a constraint for the genetic improvement of arecanut. To gain molecular insight into the palm, high throughput RNA sequencing and de novo assembly of arecanut leaf transcriptome was undertaken in the present study. A total 56,321,907 paired end reads of 101 bp length consisting of 11.343 Gb nucleotides were generated. De novo assembly resulted in 48,783 good quality transcripts, of which 67% of transcripts could be annotated against NCBI non – redundant database. The Gene Ontology (GO analysis with UniProt database identified 9222 biological process, 11268 molecular function and 7574 cellular components GO terms. Large scale expression profiling through Fragments per Kilobase per Million mapped reads (FPKM showed major genes involved in different metabolic pathways of the plant. Metabolic pathway analysis of the assembled transcripts identified 124 plant related pathways. The transcripts related to carotenoid and alkaloid biosynthetic pathways had more number of reads and FPKM values suggesting higher expression of these genes. The arecanut transcript sequences generated in the study showed high similarity with coconut, oil palm and date palm sequences retrieved from public domains. We also identified 6853 genic SSR regions in the arecanut. The possible primers were designed for SSR detection and this would simplify the future efforts in genetic characterization of arecanut.
Digital gene expression analysis based on integrated de novo transcriptome assembly of sweet potato [Ipomoea batatas (L. Lam].

Directory of Open Access Journals (Sweden)

Xiang Tao

Full Text Available BACKGROUND: Sweet potato (Ipomoea batatas L. [Lam.] ranks among the top six most important food crops in the world. It is widely grown throughout the world with high and stable yield, strong adaptability, rich nutrient content, and multiple uses. However, little is known about the molecular biology of this important non-model organism due to lack of genomic resources. Hence, studies based on high-throughput sequencing technologies are needed to get a comprehensive and integrated genomic resource and better understanding of gene expression patterns in different tissues and at various developmental stages. METHODOLOGY/PRINCIPAL FINDINGS: Illumina paired-end (PE RNA-Sequencing was performed, and generated 48.7 million of 75 bp PE reads. These reads were de novo assembled into 128,052 transcripts (≥ 100 bp, which correspond to 41.1 million base pairs, by using a combined assembly strategy. Transcripts were annotated by Blast2GO and 51,763 transcripts got BLASTX hits, in which 39,677 transcripts have GO terms and 14,117 have ECs that are associated with 147 KEGG pathways. Furthermore, transcriptome differences of seven tissues were analyzed by using Illumina digital gene expression (DGE tag profiling and numerous differentially and specifically expressed transcripts were identified. Moreover, the expression characteristics of genes involved in viral genomes, starch metabolism and potential stress tolerance and insect resistance were also identified. CONCLUSIONS/SIGNIFICANCE: The combined de novo transcriptome assembly strategy can be applied to other organisms whose reference genomes are not available. The data provided here represent the most comprehensive and integrated genomic resources for cloning and identifying genes of interest in sweet potato. Characterization of sweet potato transcriptome provides an effective tool for better understanding the molecular mechanisms of cellular processes including development of leaves and storage roots
The expanded octarepeat domain selectively binds prions and disrupts homomeric prion protein interactions

NARCIS (Netherlands)

Leliveld, S. R.; Dame, R.T.; Wuite, G.J.L.; Stitz, L.; Korth, C.

2006-01-01

Insertion of additional octarepeats into the prion protein gene has been genetically linked to familial Creutzfeldt Jakob disease and hence to de novo generation of infectious prions. The pivotal event during prion formation is the conversion of the normal prion protein (PrP

Co-Option and De Novo Gene Evolution Underlie Molluscan Shell Diversity

Science.gov (United States)

Aguilera, Felipe; McDougall, Carmel

2017-01-01

Abstract Molluscs fabricate shells of incredible diversity and complexity by localized secretions from the dorsal epithelium of the mantle. Although distantly related molluscs express remarkably different secreted gene products, it remains unclear if the evolution of shell structure and pattern is underpinned by the differential co-option of conserved genes or the integration of lineage-specific genes into the mantle regulatory program. To address this, we compare the mantle transcriptomes of 11 bivalves and gastropods of varying relatedness. We find that each species, including four Pinctada (pearl oyster) species that diverged within the last 20 Ma, expresses a unique mantle secretome. Lineage- or species-specific genes comprise a large proportion of each species’ mantle secretome. A majority of these secreted proteins have unique domain architectures that include repetitive, low complexity domains (RLCDs), which evolve rapidly, and have a proclivity to expand, contract and rearrange in the genome. There are also a large number of secretome genes expressed in the mantle that arose before the origin of gastropods and bivalves. Each species expresses a unique set of these more ancient genes consistent with their independent co-option into these mantle gene regulatory networks. From this analysis, we infer lineage-specific secretomes underlie shell diversity, and include both rapidly evolving RLCD-containing proteins, and the continual recruitment and loss of both ancient and recently evolved genes into the periphery of the regulatory network controlling gene expression in the mantle epithelium. PMID:28053006
Retrotransposons and non-protein coding RNAs

DEFF Research Database (Denmark)

Mourier, Tobias; Willerslev, Eske

2009-01-01

does not merely represent spurious transcription. We review examples of functional RNAs transcribed from retrotransposons, and address the collection of non-protein coding RNAs derived from transposable element sequences, including numerous human microRNAs and the neuronal BC RNAs. Finally, we review...
de novo computational enzyme design.

Science.gov (United States)

Zanghellini, Alexandre

2014-10-01

Recent advances in systems and synthetic biology as well as metabolic engineering are poised to transform industrial biotechnology by allowing us to design cell factories for the sustainable production of valuable fuels and chemicals. To deliver on their promises, such cell factories, as much as their brick-and-mortar counterparts, will require appropriate catalysts, especially for classes of reactions that are not known to be catalyzed by enzymes in natural organisms. A recently developed methodology, de novo computational enzyme design can be used to create enzymes catalyzing novel reactions. Here we review the different classes of chemical reactions for which active protein catalysts have been designed as well as the results of detailed biochemical and structural characterization studies. We also discuss how combining de novo computational enzyme design with more traditional protein engineering techniques can alleviate the shortcomings of state-of-the-art computational design techniques and create novel enzymes with catalytic proficiencies on par with natural enzymes. Copyright © 2014 Elsevier Ltd. All rights reserved.
A novel de novo mutation in ATP1A3 and childhood-onset schizophrenia

Science.gov (United States)

Smedemark-Margulies, Niklas; Brownstein, Catherine A.; Vargas, Sigella; Tembulkar, Sahil K.; Towne, Meghan C.; Shi, Jiahai; Gonzalez-Cuevas, Elisa; Liu, Kevin X.; Bilguvar, Kaya; Kleiman, Robin J.; Han, Min-Joon; Torres, Alcy; Berry, Gerard T.; Yu, Timothy W.; Beggs, Alan H.; Agrawal, Pankaj B.; Gonzalez-Heydrich, Joseph

2016-01-01

We describe a child with onset of command auditory hallucinations and behavioral regression at 6 yr of age in the context of longer standing selective mutism, aggression, and mild motor delays. His genetic evaluation included chromosomal microarray analysis and whole-exome sequencing. Sequencing revealed a previously unreported heterozygous de novo mutation c.385G>A in ATP1A3, predicted to result in a p.V129M amino acid change. This gene codes for a neuron-specific isoform of the catalytic α-subunit of the ATP-dependent transmembrane sodium–potassium pump. Heterozygous mutations in this gene have been reported as causing both sporadic and inherited forms of alternating hemiplegia of childhood and rapid-onset dystonia parkinsonism. We discuss the literature on phenotypes associated with known variants in ATP1A3, examine past functional studies of the role of ATP1A3 in neuronal function, and describe a novel clinical presentation associated with mutation of this gene. PMID:27626066
Novel overlapping coding sequences in Chlamydia trachomatis

DEFF Research Database (Denmark)

Jensen, Klaus Thorleif; Petersen, Lise; Falk, Søren

2006-01-01

that are in agreement with the primary annotation. Forty two genes from the primary annotation are not predicted by EasyGene. The majority of these genes are listed as hypothetical in the primary annotation. The 15 novel predicted genes all overlap with genes on the complementary strand. We find homologues of several...... of the novel genes in C. trachomatis Serovar A and Chlamydia muridarum. Several of the genes have typical gene-like and protein-like features. Furthermore, we confirm transcriptional activity from 10 of the putative genes. The combined evidence suggests that at least seven of the 15 are protein coding genes...
Tombusviruses upregulate phospholipid biosynthesis via interaction between p33 replication protein and yeast lipid sensor proteins during virus replication in yeast

International Nuclear Information System (INIS)

Barajas, Daniel; Xu, Kai; Sharma, Monika; Wu, Cheng-Yu; Nagy, Peter D.

2014-01-01

Positive-stranded RNA viruses induce new membranous structures and promote membrane proliferation in infected cells to facilitate viral replication. In this paper, the authors show that a plant-infecting tombusvirus upregulates transcription of phospholipid biosynthesis genes, such as INO1, OPI3 and CHO1, and increases phospholipid levels in yeast model host. This is accomplished by the viral p33 replication protein, which interacts with Opi1p FFAT domain protein and Scs2p VAP protein. Opi1p and Scs2p are phospholipid sensor proteins and they repress the expression of phospholipid genes. Accordingly, deletion of OPI1 transcription repressor in yeast has a stimulatory effect on TBSV RNA accumulation and enhanced tombusvirus replicase activity in an in vitro assay. Altogether, the presented data convincingly demonstrate that de novo lipid biosynthesis is required for optimal TBSV replication. Overall, this work reveals that a (+)RNA virus reprograms the phospholipid biosynthesis pathway in a unique way to facilitate its replication in yeast cells. - Highlights: • Tombusvirus p33 replication protein interacts with FFAT-domain host protein. • Tombusvirus replication leads to upregulation of phospholipids. • Tombusvirus replication depends on de novo lipid synthesis. • Deletion of FFAT-domain host protein enhances TBSV replication. • TBSV rewires host phospholipid synthesis
Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

Science.gov (United States)

Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

2018-01-01

We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation. Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases. We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes. Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.
De Novo Transcriptomic Analysis of an Oleaginous Microalga: Pathway Description and Gene Discovery for Production of Next-Generation Biofuels

Science.gov (United States)

Wan, LingLin; Han, Juan; Sang, Min; Li, AiFen; Wu, Hong; Yin, ShunJi; Zhang, ChengWu

2012-01-01

Background Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs) for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production. Results We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem. Conclusions Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:22536352
Investigation of de novo unique differentially expressed genes related to evolution in exercise response during domestication in Thoroughbred race horses.

Directory of Open Access Journals (Sweden)

Woncheoul Park

Full Text Available Previous studies of horse RNA-seq were performed by mapping sequence reads to the reference genome during transcriptome analysis. However in this study, we focused on two main ideas. First, differentially expressed genes (DEGs were identified by de novo-based analysis (DBA in RNA-seq data from six Thoroughbreds before and after exercise, here-after referred to as "de novo unique differentially expressed genes" (DUDEG. Second, by integrating both conventional DEGs and genes identified as being selected for during domestication of Thoroughbred and Jeju pony from whole genome re-sequencing (WGS data, we give a new concept to the definition of DEG. We identified 1,034 and 567 DUDEGs in skeletal muscle and blood, respectively. DUDEGs in skeletal muscle were significantly related to exercise-induced stress biological process gene ontology (BP-GO terms: 'immune system process'; 'response to stimulus'; and, 'death' and a KEGG pathways: 'JAK-STAT signaling pathway'; 'MAPK signaling pathway'; 'regulation of actin cytoskeleton'; and, 'p53 signaling pathway'. In addition, we found TIMELESS, EIF4A3 and ZNF592 in blood and CHMP4C and FOXO3 in skeletal muscle, to be in common between DUDEGs and selected genes identified by evolutionary statistics such as FST and Cross Population Extended Haplotype Homozygosity (XP-EHH. Moreover, in Thoroughbreds, three out of five genes (CHMP4C, EIF4A3 and FOXO3 related to exercise response showed relatively low nucleotide diversity compared to the Jeju pony. DUDEGs are not only conceptually new DEGs that cannot be attained from reference-based analysis (RBA but also supports previous RBA results related to exercise in Thoroughbred. In summary, three exercise related genes which were selected for during domestication in the evolutionary history of Thoroughbred were identified as conceptually new DEGs in this study.
Investigation of de novo unique differentially expressed genes related to evolution in exercise response during domestication in Thoroughbred race horses.

Science.gov (United States)

Park, Woncheoul; Kim, Jaemin; Kim, Hyeon Jeong; Choi, JaeYoung; Park, Jeong-Woong; Cho, Hyun-Woo; Kim, Byeong-Woo; Park, Myung Hum; Shin, Teak-Soon; Cho, Seong-Keun; Park, Jun-Kyu; Kim, Heebal; Hwang, Jae Yeon; Lee, Chang-Kyu; Lee, Hak-Kyo; Cho, Seoae; Cho, Byung-Wook

2014-01-01

Previous studies of horse RNA-seq were performed by mapping sequence reads to the reference genome during transcriptome analysis. However in this study, we focused on two main ideas. First, differentially expressed genes (DEGs) were identified by de novo-based analysis (DBA) in RNA-seq data from six Thoroughbreds before and after exercise, here-after referred to as "de novo unique differentially expressed genes" (DUDEG). Second, by integrating both conventional DEGs and genes identified as being selected for during domestication of Thoroughbred and Jeju pony from whole genome re-sequencing (WGS) data, we give a new concept to the definition of DEG. We identified 1,034 and 567 DUDEGs in skeletal muscle and blood, respectively. DUDEGs in skeletal muscle were significantly related to exercise-induced stress biological process gene ontology (BP-GO) terms: 'immune system process'; 'response to stimulus'; and, 'death' and a KEGG pathways: 'JAK-STAT signaling pathway'; 'MAPK signaling pathway'; 'regulation of actin cytoskeleton'; and, 'p53 signaling pathway'. In addition, we found TIMELESS, EIF4A3 and ZNF592 in blood and CHMP4C and FOXO3 in skeletal muscle, to be in common between DUDEGs and selected genes identified by evolutionary statistics such as FST and Cross Population Extended Haplotype Homozygosity (XP-EHH). Moreover, in Thoroughbreds, three out of five genes (CHMP4C, EIF4A3 and FOXO3) related to exercise response showed relatively low nucleotide diversity compared to the Jeju pony. DUDEGs are not only conceptually new DEGs that cannot be attained from reference-based analysis (RBA) but also supports previous RBA results related to exercise in Thoroughbred. In summary, three exercise related genes which were selected for during domestication in the evolutionary history of Thoroughbred were identified as conceptually new DEGs in this study.
Hidden Structural Codes in Protein Intrinsic Disorder.

Science.gov (United States)

Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo

2017-10-17

Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.
De novo transcriptome sequencing and digital gene expression analysis predict biosynthetic pathway of rhynchophylline and isorhynchophylline from Uncaria rhynchophylla, a non-model plant with potent anti-alzheimer's properties.

Science.gov (United States)

Guo, Qianqian; Ma, Xiaojun; Wei, Shugen; Qiu, Deyou; Wilson, Iain W; Wu, Peng; Tang, Qi; Liu, Lijun; Dong, Shoukun; Zu, Wei

2014-08-12

The major medicinal alkaloids isolated from Uncaria rhynchophylla (gouteng in chinese) capsules are rhynchophylline (RIN) and isorhynchophylline (IRN). Extracts containing these terpene indole alkaloids (TIAs) can inhibit the formation and destabilize preformed fibrils of amyloid β protein (a pathological marker of Alzheimer's disease), and have been shown to improve the cognitive function of mice with Alzheimer-like symptoms. The biosynthetic pathways of RIN and IRN are largely unknown. In this study, RNA-sequencing of pooled Uncaria capsules RNA samples taken at three developmental stages that accumulate different amount of RIN and IRN was performed. More than 50 million high-quality reads from a cDNA library were generated and de novo assembled. Sequences for all of the known enzymes involved in TIAs synthesis were identified. Additionally, 193 cytochrome P450 (CYP450), 280 methyltransferase and 144 isomerase genes were identified, that are potential candidates for enzymes involved in RIN and IRN synthesis. Digital gene expression profile (DGE) analysis was performed on the three capsule developmental stages, and based on genes possessing expression profiles consistent with RIN and IRN levels; four CYP450s, three methyltransferases and three isomerases were identified as the candidates most likely to be involved in the later steps of RIN and IRN biosynthesis. A combination of de novo transcriptome assembly and DGE analysis was shown to be a powerful method for identifying genes encoding enzymes potentially involved in the biosynthesis of important secondary metabolites in a non-model plant. The transcriptome data from this study provides an important resource for understanding the formation of major bioactive constituents in the capsule extract from Uncaria, and provides information that may aid in metabolic engineering to increase yields of these important alkaloids.
SINEUPs are modular antisense long-non coding RNAs that increase synthesis of target proteins in cells

Directory of Open Access Journals (Sweden)

Silvia eZucchelli

2015-05-01

Full Text Available Despite recent efforts in discovering novel long non-coding RNAs (lncRNAs and unveiling their functions in a wide range of biological processes their applications as biotechnological or therapeutic tools are still at their infancy. We have recently shown that AS Uchl1, a natural lncRNA antisense to the Parkinson’s disease-associated gene Ubiquitin carboxyl-terminal esterase L1 (Uchl1, is able to increase UchL1 protein synthesis at post-transcriptional level. Its activity requires two RNA elements: an embedded inverted SINEB2 sequence to increase translation and the overlapping region to target its sense mRNA. This functional organization is shared with several mouse lncRNAs antisense to protein coding genes. The potential use of AS Uchl1-derived lncRNAs as enhancers of target mRNA translation remains unexplored. Here we define AS Uchl1 as the representative member of a new functional class of natural and synthetic antisense lncRNAs that activate translation. We named this class of RNAs SINEUPs for their requirement of the inverted SINEB2 sequence to UP-regulate translation in a gene-specific manner. The overlapping region is indicated as the Binding Doman (BD while the embedded inverted SINEB2 element is the Effector Domain (ED. By swapping BD, synthetic SINEUPs are designed targeting mRNAs of interest. SINEUPs function in an array of cell lines and can be efficiently directed towards N-terminally tagged proteins. Their biological activity is retained in a miniaturized version within the range of small RNAs length. Its modular structure was exploited to successfully design synthetic SINEUPs targeting endogenous Parkinson’s disease-associated DJ-1 and proved to be active in different neuronal cell lines.In summary, SINEUPs represent the first scalable tool to increase synthesis of proteins of interest. We propose SINEUPs as reagents for molecular biology experiments, in protein manufacturing as well as in therapy of haploinsufficiencies.
Function and Application Areas in Medicine of Non-Coding RNA

Directory of Open Access Journals (Sweden)

Figen Guzelgul

2009-06-01

Full Text Available RNA is the genetic material converting the genetic code that it gets from DNA into protein. While less than 2 % of RNA is converted into protein , more than 98 % of it can not be converted into protein and named as non-coding RNAs. 70 % of noncoding RNAs consists of introns , however, the rest part of them consists of exons. Non-coding RNAs are examined in two classes according to their size and functions. Whereas they are classified as long non-coding and small non-coding RNAs according to their size , they are grouped as housekeeping non-coding RNAs and regulating non-coding RNAs according to their function. For long years ,these non-coding RNAs have been considered as non-functional. However, today, it has been proved that these non-coding RNAs play role in regulating genes and in structural, functional and catalitic roles of RNAs converted into protein. Due to its taking a role in gene silencing mechanism, particularly in medical world , non-coding RNAs have led to significant developments. RNAi technolgy , which is used in designing drugs to be used in treatment of various diseases , is a ray of hope for medical world. [Archives Medical Review Journal 2009; 18(3.000: 141-155
Sequencing of sporadic Attention-Deficit Hyperactivity Disorder (ADHD) identifies novel and potentially pathogenic de novo variants and excludes overlap with genes associated with autism spectrum disorder.

Science.gov (United States)

Kim, Daniel Seung; Burt, Amber A; Ranchalis, Jane E; Wilmot, Beth; Smith, Joshua D; Patterson, Karynne E; Coe, Bradley P; Li, Yatong K; Bamshad, Michael J; Nikolas, Molly; Eichler, Evan E; Swanson, James M; Nigg, Joel T; Nickerson, Deborah A; Jarvik, Gail P

2017-06-01

Attention-Deficit Hyperactivity Disorder (ADHD) has high heritability; however, studies of common variation account for ADHD variance. Using data from affected participants without a family history of ADHD, we sought to identify de novo variants that could account for sporadic ADHD. Considering a total of 128 families, two analyses were conducted in parallel: first, in 11 unaffected parent/affected proband trios (or quads with the addition of an unaffected sibling) we completed exome sequencing. Six de novo missense variants at highly conserved bases were identified and validated from four of the 11 families: the brain-expressed genes TBC1D9, DAGLA, QARS, CSMD2, TRPM2, and WDR83. Separately, in 117 unrelated probands with sporadic ADHD, we sequenced a panel of 26 genes implicated in intellectual disability (ID) and autism spectrum disorder (ASD) to evaluate whether variation in ASD/ID-associated genes were also present in participants with ADHD. Only one putative deleterious variant (Gln600STOP) in CHD1L was identified; this was found in a single proband. Notably, no other nonsense, splice, frameshift, or highly conserved missense variants in the 26 gene panel were identified and validated. These data suggest that de novo variant analysis in families with independently adjudicated sporadic ADHD diagnosis can identify novel genes implicated in ADHD pathogenesis. Moreover, that only one of the 128 cases (0.8%, 11 exome, and 117 MIP sequenced participants) had putative deleterious variants within our data in 26 genes related to ID and ASD suggests significant independence in the genetic pathogenesis of ADHD as compared to ASD and ID phenotypes. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Selective Constraints on Coding Sequences of Nervous System Genes Are a Major Determinant of Duplicate Gene Retention in Vertebrates.

Science.gov (United States)

Roux, Julien; Liu, Jialin; Robinson-Rechavi, Marc

2017-11-01

The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
De novo structural modeling and computational sequence analysis ...

African Journals Online (AJOL)

Different bioinformatics tools and machine learning techniques were used for protein structural classification. De novo protein modeling was performed by using I-TASSER server. The final model obtained was accessed by PROCHECK and DFIRE2, which confirmed that the final model is reliable. Until complete biochemical ...
A central role for ubiquitination within a circadian clock protein modification code

Directory of Open Access Journals (Sweden)

Katarina eStojkovic

2014-08-01

Full Text Available Circadian rhythms, endogenous cycles of about 24 h in physiology, are generated by a master clock located in the suprachiasmatic nucleus of the hypothalamus and other clocks located in the brain and peripheral tissues. Circadian disruption is known to increase the incidence of various illnesses, such as mental disorders, metabolic syndrome and cancer. At the molecular level, periodicity is established by a set of clock genes via autoregulatory translation-transcription feedback loops. This clock mechanism is regulated by post-translational modifications such as phosphorylation and ubiquitination, which set the pace of the clock. Ubiquitination in particular has been found to regulate the stability of core clock components, but also other clock protein functions. Mutation of genes encoding ubiquitin ligases can cause either elongation or shortening of the endogenous circadian period. Recent research has also started to uncover roles for deubiquitination in the molecular clockwork. Here we review the role of the ubiquitin pathway in regulating the circadian clock and we propose that ubiquitination is a key element in a clock protein modification code that orchestrates clock mechanisms and circadian behavior over the daily cycle.
Molecular cloning of a Candida albicans gene (SSB1) coding for a protein related to the Hsp70 family.

Science.gov (United States)

Maneu, V; Cervera, A M; Martinez, J P; Gozalbo, D

1997-06-15

We have cloned and sequenced a Candida albicans gene (SSB1) encoding a potential member of the heat-shock protein seventy (hsp70) family. The protein encoded by this gene contains 613 amino acids and shows a high degree (85%) of sequence identity to the ssb subfamily (ssb1 and ssb2) of the Saccharomyces cerevisiae hsp70 family. The transcribed mRNA (2.1 kb) is present in similar amounts both in yeast and germ tube cells of C. albicans.
Endostatin gene variation and protein levels in breast cancer susceptibility and severity

International Nuclear Information System (INIS)

Balasubramanian, Sabapathy P; Cross, Simon S; Globe, Jenny; Cox, Angela; Brown, Nicola J; Reed, Malcolm W

2007-01-01

Endostatin is a potent endogenous anti-angiogenic agent which inhibits tumour growth. A non-synonymous coding polymorphism in the Endostatin gene is thought to affect Endostatin activity. We aimed to determine the role of this Endostatin polymorphism in breast cancer pathogenesis and any influence on serum Endostatin levels in healthy volunteers. Endostatin protein expression on a breast cancer micro array was also studied to determine any relationship to genotype and to breast cancer prognosis. The 4349G > A (coding non-synonymous) polymorphism in exon 42 of the Endostatin gene was genotyped in approximately 846 breast cancer cases and 707 appropriate controls. In a separate healthy cohort of 57 individuals, in addition to genotyping, serum Endostatin levels were measured using enzyme linked immunosorbant assay (ELISA). A semi-quantitative assessment of Endostatin protein expression on immunostained tissue micro arrays (TMA) constructed from breast cancer samples of patients with genotype data was performed. The rare allele (A) was significantly associated with invasive breast cancers compared to non-invasive tumours (p = 0.03), but there was no association with tumour grade, nodal status, vascular invasion or overall survival. There was no association with breast cancer susceptibility. Serum Endostatin levels and Endostatin protein expression on the tissue micro array were not associated with genotype. The Endostatin 4349A allele is associated with invasive breast cancer. The Endostatin 4349G > A polymorphism however does not appear to be associated with breast cancer susceptibility or severity in invasive disease. By studying circulating levels and tumour Endostatin protein expression, we have shown that any influence of this polymorphism is unlikely to be through an effect on the levels of protein produced

Investigation of genes encoding calcineurin B-like protein family in legumes and their expression analyses in chickpea (Cicer arietinum L.).

Science.gov (United States)

Meena, Mukesh Kumar; Ghawana, Sanjay; Sardar, Atish; Dwivedi, Vikas; Khandal, Hitaishi; Roy, Riti; Chattopadhyay, Debasis

2015-01-01

Calcium ion (Ca2+) is a ubiquitous second messenger that transmits various internal and external signals including stresses and, therefore, is important for plants' response process. Calcineurin B-like proteins (CBLs) are one of the plant calcium sensors, which sense and convey the changes in cytosolic Ca2+-concentration for response process. A search in four leguminous plant (soybean, Medicago truncatula, common bean and chickpea) genomes identified 9 to 15 genes in each species that encode CBL proteins. Sequence analyses of CBL peptides and coding sequences (CDS) suggested that there are nine original CBL genes in these legumes and some of them were multiplied during whole genome or local gene duplication. Coding sequences of chickpea CBL genes (CaCBL) were cloned from their cDNAs and sequenced, and their annotations in the genome assemblies were corrected accordingly. Analyses of protein sequences and gene structures of CBL family in plant kingdom indicated its diverse origin but showed a remarkable conservation in overall protein structure with appearance of complex gene structure in the course of evolution. Expression of CaCBL genes in different tissues and in response to different stress and hormone treatment were studied. Most of the CaCBL genes exhibited high expression in flowers. Expression profile of CaCBL genes in response to different abiotic stresses and hormones related to development and stresses (ABA, auxin, cytokinin, SA and JA) at different time intervals suggests their diverse roles in development and plant defence in addition to abiotic stress tolerance. These data not only contribute to a better understanding of the complex regulation of chickpea CBL gene family, but also provide valuable information for further research in chickpea functional genomics.
The chaperonin-60 universal target is a barcode for bacteria that enables de novo assembly of metagenomic sequence data.

Science.gov (United States)

Links, Matthew G; Dumonceaux, Tim J; Hemmingsen, Sean M; Hill, Janet E

2012-01-01

Barcoding with molecular sequences is widely used to catalogue eukaryotic biodiversity. Studies investigating the community dynamics of microbes have relied heavily on gene-centric metagenomic profiling using two genes (16S rRNA and cpn60) to identify and track Bacteria. While there have been criteria formalized for barcoding of eukaryotes, these criteria have not been used to evaluate gene targets for other domains of life. Using the framework of the International Barcode of Life we evaluated DNA barcodes for Bacteria. Candidates from the 16S rRNA gene and the protein coding cpn60 gene were evaluated. Within complete bacterial genomes in the public domain representing 983 species from 21 phyla, the largest difference between median pairwise inter- and intra-specific distances ("barcode gap") was found from cpn60. Distribution of sequence diversity along the ∼555 bp cpn60 target region was remarkably uniform. The barcode gap of the cpn60 universal target facilitated the faithful de novo assembly of full-length operational taxonomic units from pyrosequencing data from a synthetic microbial community. Analysis supported the recognition of both 16S rRNA and cpn60 as DNA barcodes for Bacteria. The cpn60 universal target was found to have a much larger barcode gap than 16S rRNA suggesting cpn60 as a preferred barcode for Bacteria. A large barcode gap for cpn60 provided a robust target for species-level characterization of data. The assembly of consensus sequences for barcodes was shown to be a reliable method for the identification and tracking of novel microbes in metagenomic studies.
Molecular characterization of a phloem-specific gene encoding the filament protein, phloem protein 1 (PP1), from Cucurbita maxima.

Science.gov (United States)

Clark, A M; Jacobsen, K R; Bostwick, D E; Dannenhoffer, J M; Skaggs, M I; Thompson, G A

1997-07-01

Sieve elements in the phloem of most angiosperms contain proteinaceous filaments and aggregates called P-protein. In the genus Cucurbita, these filaments are composed of two major proteins: PP1, the phloem filament protein, and PP2, the phloem lactin. The gene encoding the phloem filament protein in pumpkin (Cucurbita maxima Duch.) has been isolated and characterized. Nucleotide sequence analysis of the reconstructed gene gPP1 revealed a continuous 2430 bp protein coding sequence, with no introns, encoding an 809 amino acid polypeptide. The deduced polypeptide had characteristics of PP1 and contained a 15 amino acid sequence determined by N-terminal peptide sequence analysis of PP1. The sequence of PP1 was highly repetitive with four 200 amino acid sequence domains containing structural motifs in common with cysteine proteinase inhibitors. Expression of the PP1 gene was detected in roots, hypocotyls, cotyledons, stems, and leaves of pumpkin plants. PP1 and its mRNA accumulated in pumpkin hypocotyls during the period of rapid hypocotyl elongation after which mRNA levels declined, while protein levels remained elevated. PP1 was immunolocalized in slime plugs and P-protein bodies in sieve elements of the phloem. Occasionally, PP1 was detected in companion cells. PP1 mRNA was localized by in situ hybridization in companion cells at early stages of vascular differentiation. The developmental accumulation and localization of PP1 and its mRNA paralleled the phloem lactin, further suggesting an interaction between these phloem-specific proteins.
Unusual Presentation of Pelizaeus-Merzbacher Disease: Female Patient with Deletion of the Proteolipid Protein 1 Gene

Directory of Open Access Journals (Sweden)

Teva Brender

2015-01-01

Full Text Available Pelizaeus-Merzbacher disease (PMD is neurodegenerative leukodystrophy caused by dysfunction of the proteolipid protein 1 (PLP1 gene on Xq22, which codes for an essential myelin protein. As an X-linked condition, PMD primarily affects males; however there have been a small number of affected females reported in the medical literature with a variety of different mutations in this gene. No affected females to date have a deletion like our patient. In addition to this, our patient has skewed X chromosome inactivation which adds to her presentation as her unaffected mother also carries the mutation.
UniNovo: a universal tool for de novo peptide sequencing.

Science.gov (United States)

Jeong, Kyowon; Kim, Sangtae; Pevzner, Pavel A

2013-08-15

Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but de novo peptide sequencing algorithms to analyze tandem mass (MS/MS) spectra are lagging behind. Although existing de novo sequencing tools perform well on certain types of spectra [e.g. Collision Induced Dissociation (CID) spectra of tryptic peptides], their performance often deteriorates on other types of spectra, such as Electron Transfer Dissociation (ETD), Higher-energy Collisional Dissociation (HCD) spectra or spectra of non-tryptic digests. Thus, rather than developing a new algorithm for each type of spectra, we develop a universal de novo sequencing algorithm called UniNovo that works well for all types of spectra or even for spectral pairs (e.g. CID/ETD spectral pairs). UniNovo uses an improved scoring function that captures the dependences between different ion types, where such dependencies are learned automatically using a modified offset frequency function. The performance of UniNovo is compared with PepNovo+, PEAKS and pNovo using various types of spectra. The results show that the performance of UniNovo is superior to other tools for ETD spectra and superior or comparable with others for CID and HCD spectra. UniNovo also estimates the probability that each reported reconstruction is correct, using simple statistics that are readily obtained from a small training dataset. We demonstrate that the estimation is accurate for all tested types of spectra (including CID, HCD, ETD, CID/ETD and HCD/ETD spectra of trypsin, LysC or AspN digested peptides). UniNovo is implemented in JAVA and tested on Windows, Ubuntu and OS X machines. UniNovo is available at http://proteomics.ucsd.edu/Software/UniNovo.html along with the manual.
Primate-specific spliced PMCHL RNAs are non-protein coding in human and macaque tissues

Directory of Open Access Journals (Sweden)

Delerue-Audegond Audrey

2008-12-01

Full Text Available Abstract Background Brain-expressed genes that were created in primate lineage represent obvious candidates to investigate molecular mechanisms that contributed to neural reorganization and emergence of new behavioural functions in Homo sapiens. PMCHL1 arose from retroposition of a pro-melanin-concentrating hormone (PMCH antisense mRNA on the ancestral human chromosome 5p14 when platyrrhines and catarrhines diverged. Mutations before divergence of hylobatidae led to creation of new exons and finally PMCHL1 duplicated in an ancestor of hominids to generate PMCHL2 at the human chromosome 5q13. A complex pattern of spliced and unspliced PMCHL RNAs were found in human brain and testis. Results Several novel spliced PMCHL transcripts have been characterized in human testis and fetal brain, identifying an additional exon and novel splice sites. Sequencing of PMCHL genes in several non-human primates allowed to carry out phylogenetic analyses revealing that the initial retroposition event took place within an intron of the brain cadherin (CDH12 gene, soon after platyrrhine/catarrhine divergence, i.e. 30–35 Mya, and was concomitant with the insertion of an AluSg element. Sequence analysis of the spliced PMCHL transcripts identified only short ORFs of less than 300 bp, with low (VMCH-p8 and protein variants or no evolutionary conservation. Western blot analyses of human and macaque tissues expressing PMCHL RNA failed to reveal any protein corresponding to VMCH-p8 and protein variants encoded by spliced transcripts. Conclusion Our present results improve our knowledge of the gene structure and the evolutionary history of the primate-specific chimeric PMCHL genes. These genes produce multiple spliced transcripts, bearing short, non-conserved and apparently non-translated ORFs that may function as mRNA-like non-coding RNAs.
The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

Science.gov (United States)

Štambuk, Nikola

The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.
The Heat Shock Protein 26 Gene is Required for Ethanol Tolerance in Drosophila

Directory of Open Access Journals (Sweden)

Awoyemi A. Awofala

2011-01-01

Full Text Available Stress plays an important role in drug- and addiction-related behaviours. However, the mechanisms underlying these behavioural responses are still poorly understood. In the light of recent reports that show consistent regulation of many genes encoding stress proteins including heat shock proteins following ethanol exposure in Drosophila , it was hypothesised that transition to alcohol dependence may involve the dysregulation of the circuits that mediate behavioural responses to stressors. Thus, behavioural genetic methodologies were used to investigate the role of the Drosophila hsp26 gene, a small heat shock protein coding gene which is induced in response to various stresses, in the development of rapid tolerance to ethanol sedation. Rapid tolerance was quantified as the percentage difference in the mean sedation times between the second and first ethanol exposure. Two independently isolated P-element mutations near the hsp26 gene eliminated the capacity for tolerance. In addition, RNAi-mediated functional knockdown of hsp26 expression in the glial cells and the whole nervous system also caused a defect in tolerance development. The rapid tolerance phenotype of the hsp26 mutants was rescued by the expression of the wild-type hsp26 gene in the nervous system. None of these manipulations of the hsp26 gene caused changes in the rate of ethanol absorption. Hsp26 genes are evolutionary conserved, thus the role of hsp26 in ethanol tolerance may present a new direction for research into alcohol dependency.
A Gene Family Coding for Salivary Proteins (SHOT) of the Polyphagous Spider Mite Tetranychus urticae Exhibits Fast Host-Dependent Transcriptional Plasticity.

Science.gov (United States)

Jonckheere, Wim; Dermauw, Wannes; Khalighi, Mousaalreza; Pavlidi, Nena; Reubens, Wim; Baggerman, Geert; Tirry, Luc; Menschaert, Gerben; Kant, Merijn R; Vanholme, Bartel; Van Leeuwen, Thomas

2018-01-01

The salivary protein repertoire released by the herbivorous pest Tetranychus urticae is assumed to hold keys to its success on diverse crops. We report on a spider mite-specific protein family that is expanded in T. urticae. The encoding genes have an expression pattern restricted to the anterior podocephalic glands, while peptide fragments were found in the T. urticae secretome, supporting the salivary nature of these proteins. As peptide fragments were identified in a host-dependent manner, we designated this family as the SHOT (secreted host-responsive protein of Tetranychidae) family. The proteins were divided in three groups based on sequence similarity. Unlike TuSHOT3 genes, TuSHOT1 and TuSHOT2 genes were highly expressed when feeding on a subset of family Fabaceae, while expression was depleted on other hosts. TuSHOT1 and TuSHOT2 expression was induced within 24 h after certain host transfers, pointing toward transcriptional plasticity rather than selection as the cause. Transfer from an 'inducer' to a 'noninducer' plant was associated with slow yet strong downregulation of TuSHOT1 and TuSHOT2, occurring over generations rather than hours. This asymmetric on and off regulation points toward host-specific effects of SHOT proteins, which is further supported by the diversity of SHOT genes identified in Tetranychidae with a distinct host repertoire.
Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases

Directory of Open Access Journals (Sweden)

Ma'ayan Avi

2007-10-01

Full Text Available Abstract Background In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP, generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Results Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Conclusion Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.
Methylation of miRNA genes and oncogenesis.

Science.gov (United States)

Loginov, V I; Rykov, S V; Fridman, M V; Braga, E A

2015-02-01

Interaction between microRNA (miRNA) and messenger RNA of target genes at the posttranscriptional level provides fine-tuned dynamic regulation of cell signaling pathways. Each miRNA can be involved in regulating hundreds of protein-coding genes, and, conversely, a number of different miRNAs usually target a structural gene. Epigenetic gene inactivation associated with methylation of promoter CpG-islands is common to both protein-coding genes and miRNA genes. Here, data on functions of miRNAs in development of tumor-cell phenotype are reviewed. Genomic organization of promoter CpG-islands of the miRNA genes located in inter- and intragenic areas is discussed. The literature and our own results on frequency of CpG-island methylation in miRNA genes from tumors are summarized, and data regarding a link between such modification and changed activity of miRNA genes and, consequently, protein-coding target genes are presented. Moreover, the impact of miRNA gene methylation on key oncogenetic processes as well as affected signaling pathways is discussed.
Potential hot spot for de novo mutations in PTCH1 gene in Gorlin syndrome patients: a case report of twins from Croatia.

Science.gov (United States)

Musani, Vesna; Ozretić, Petar; Trnski, Diana; Sabol, Maja; Poduje, Sanja; Tošić, Mateja; Šitum, Mirna; Levanat, Sonja

2018-02-28

We describe a case of twins with sporadic Gorlin syndrome. Both twins had common Gorlin syndrome features including calcification of the falx cerebri, multiple jaw keratocysts, and multiple basal cell carcinomas, but with different expressivity. One brother also had benign testicular mesothelioma. We propose this tumor type as a possible new feature of Gorlin syndrome. Gorlin syndrome is a rare autosomal dominant disorder characterized by both developmental abnormalities and cancer predisposition, with variable expression of various developmental abnormalities and different types of tumors. The syndrome is primarily caused by mutations in the Patched 1 (PTCH1) gene, although rare mutations of Patched 2 (PTCH2) or Suppressor of Fused (SUFU) genes have also been found. Neither founder mutations nor hot spot locations have been described for PTCH1 in Gorlin syndrome patients. Although de novo mutations of the PTCH1 gene occur in almost 50% of Gorlin syndrome cases, there are a few recurrent mutations. Our twin patients were carriers of a de novo mutation in the PTCH1 gene, c.3364_3365delAT (p.Met1122ValfsX22). This is, to our knowledge, the first Gorlin syndrome-causing mutation that has been reported four independent times in distant geographical locations. Therefore, we propose the location of the described mutation as a potential hot spot for mutations in PTCH1.
Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

Science.gov (United States)

Richardson, Dale N.; Wiehe, Thomas

Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.
Translational regulation of gene expression by an anaerobically induced small non-coding RNA in Escherichia coli

DEFF Research Database (Denmark)

Boysen, Anders; Møller-Jensen, Jakob; Kallipolitis, Birgitte H.

2010-01-01

Small non-coding RNAs (sRNA) have emerged as important elements of gene regulatory circuits. In enterobacteria such as Escherichia coli and Salmonella many of these sRNAs interact with the Hfq protein, an RNA chaperone similar to mammalian Sm-like proteins and act in the post...... that adaptation to anaerobic growth involves the action of a small regulatory RNA....... of at least one sRNA regulator. Here, we extend this view by the identification and characterization of a highly conserved, anaerobically induced small sRNA in E. coli, whose expression is strictly dependent on the anaerobic transcriptional fumarate and nitrate reductase regulator (FNR). The sRNA, named Fnr...
De novo synthesis and functional analysis of the phosphatase-encoding gene acI-B of uncultured Actinobacteria from Lake Stechlin (NE Germany).

Science.gov (United States)

Srivastava, Abhishek; McMahon, Katherine D; Stepanauskas, Ramunas; Grossart, Hans-Peter

2015-12-01

The National Center for Biotechnology Information [http://www.ncbi.nlm.nih.gov/guide/taxonomy/] database enlists more than 15,500 bacterial species. But this also includes a plethora of uncultured bacterial representations. Owing to their metabolism, they directly influence biogeochemical cycles, which underscores the the important status of bacteria on our planet. To study the function of a gene from an uncultured bacterium, we have undertaken a de novo gene synthesis approach. Actinobacteria of the acI-B subcluster are important but yet uncultured members of the bacterioplankton in temperate lakes of the northern hemisphere such as oligotrophic Lake Stechlin (NE Germany). This lake is relatively poor in phosphate (P) and harbors on average ~1.3 x 10 6 bacterial cells/ml, whereby Actinobacteria of the ac-I lineage can contribute to almost half of the entire bacterial community depending on seasonal variability. Single cell genome analysis of Actinobacterium SCGC AB141-P03, a member of the acI-B tribe in Lake Stechlin has revealed several phosphate-metabolizing genes. The genome of acI-B Actinobacteria indicates potential to degrade polyphosphate compound. To test for this genetic potential, we targeted the exoP-annotated gene potentially encoding polyphosphatase and synthesized it artificially to examine its biochemical role. Heterologous overexpression of the gene in Escherichia coli and protein purification revealed phosphatase activity. Comparative genome analysis suggested that homologs of this gene should be also present in other Actinobacteria of the acI lineages. This strategic retention of specialized genes in their genome provides a metabolic advantage over other members of the aquatic food web in a P-limited ecosystem. [Int Microbiol 2016; 19(1):39-47]. Copyright© by the Spanish Society for Microbiology and Institute for Catalan Studies.
Investigation of genes encoding calcineurin B-like protein family in legumes and their expression analyses in chickpea (Cicer arietinum L..

Directory of Open Access Journals (Sweden)

Mukesh Kumar Meena

Full Text Available Calcium ion (Ca2+ is a ubiquitous second messenger that transmits various internal and external signals including stresses and, therefore, is important for plants' response process. Calcineurin B-like proteins (CBLs are one of the plant calcium sensors, which sense and convey the changes in cytosolic Ca2+-concentration for response process. A search in four leguminous plant (soybean, Medicago truncatula, common bean and chickpea genomes identified 9 to 15 genes in each species that encode CBL proteins. Sequence analyses of CBL peptides and coding sequences (CDS suggested that there are nine original CBL genes in these legumes and some of them were multiplied during whole genome or local gene duplication. Coding sequences of chickpea CBL genes (CaCBL were cloned from their cDNAs and sequenced, and their annotations in the genome assemblies were corrected accordingly. Analyses of protein sequences and gene structures of CBL family in plant kingdom indicated its diverse origin but showed a remarkable conservation in overall protein structure with appearance of complex gene structure in the course of evolution. Expression of CaCBL genes in different tissues and in response to different stress and hormone treatment were studied. Most of the CaCBL genes exhibited high expression in flowers. Expression profile of CaCBL genes in response to different abiotic stresses and hormones related to development and stresses (ABA, auxin, cytokinin, SA and JA at different time intervals suggests their diverse roles in development and plant defence in addition to abiotic stress tolerance. These data not only contribute to a better understanding of the complex regulation of chickpea CBL gene family, but also provide valuable information for further research in chickpea functional genomics.
Sequencing and Characterization of Novel PII Signaling Protein Gene in Microalga Haematococcus pluvialis

Directory of Open Access Journals (Sweden)

Ruijuan Ma

2017-10-01

Full Text Available The PII signaling protein is a key protein for controlling nitrogen assimilatory reactions in most organisms, but little information is reported on PII proteins of green microalga Haematococcus pluvialis. Since H. pluvialis cells can produce a large amount of astaxanthin upon nitrogen starvation, its PII protein may represent an important factor on elevated production of Haematococcus astaxanthin. This study identified and isolated the coding gene (HpGLB1 from this microalga. The full-length of HpGLB1 was 1222 bp, including 621 bp coding sequence (CDS, 103 bp 5′ untranslated region (5′ UTR, and 498 bp 3′ untranslated region (3′ UTR. The CDS could encode a protein with 206 amino acids (HpPII. Its calculated molecular weight (Mw was 22.4 kDa and the theoretical isoelectric point was 9.53. When H. pluvialis cells were exposed to nitrogen starvation, the HpGLB1 expression was increased 2.46 times in 48 h, concomitant with the raise of astaxanthin content. This study also used phylogenetic analysis to prove that HpPII was homogeneous to the PII proteins of other green microalgae. The results formed a fundamental basis for the future study on HpPII, for its potential physiological function in Haematococcus astaxanthin biosysthesis.
The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.

Science.gov (United States)

Ponce de Leon, Miguel; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas

2014-01-01

For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional
Cloning and expression of three thaumatin-like protein genes from Polyporus umbellatus

Directory of Open Access Journals (Sweden)

Mengmeng Liu

2017-05-01

Full Text Available Genes encoding thaumatin-like protein (TLPs are frequently found in fungal genomes. However, information on TLP genes in Polyporus umbellatus is still limited. In this study, three TLP genes were cloned from P. umbellatus. The full-length coding sequence of PuTLP1, PuTLP2 and PuTLP3 were 768, 759 and 561 bp long, respectively, encoding for 256, 253 and 187 amino acids. Phylogenetic trees showed that P. umbellatus PuTLP1, PuTLP2 and PuTLP3 were clustered with sequences from Gloeophyllum trabeum, Trametes versicolor and Stereum hirsutum, respectively. The expression patterns of the three TLP genes were higher in P. umbellatus with Armillaria mellea infection than in the sclerotia without A. mellea. Furthermore, over-expression of three PuTLPs were carried out in Escherichia coli BL21 (DE3 strain, and high quality proteins were obtained using Ni-NTA resin that can be used for preparation of specific antibodies. These results suggest that PuTLP1, PuTLP2 and PuTLP3 in P. umbellatus may be involved in the defense response to A. mellea infections.
Gene composer: database software for protein construct design, codon engineering, and gene synthesis.

Science.gov (United States)

Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

2009-04-21

To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease

Gene Composer: database software for protein construct design, codon engineering, and gene synthesis

Directory of Open Access Journals (Sweden)

Mixon Mark

2009-04-01

Full Text Available Abstract Background To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. Results An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. Conclusion We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene
Complete genome sequencing of the luminescent bacterium, Vibrio qinghaiensis sp. Q67 using PacBio technology

Science.gov (United States)

Gong, Liang; Wu, Yu; Jian, Qijie; Yin, Chunxiao; Li, Taotao; Gupta, Vijai Kumar; Duan, Xuewu; Jiang, Yueming

2018-01-01

Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485 nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500 bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.
Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

Energy Technology Data Exchange (ETDEWEB)

Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.; Porwollik, Steffen; Jones, Marcus B.; Yoon, Hyunjin; Payne, Samuel H.; Martin, Jessica L.; Burnet, Meagan C.; Monroe, Matthew E.; Venepally, Pratap; Smith, Richard D.; Peterson, Scott; Heffron, Fred; Mcclelland, Michael; Adkins, Joshua N.

2011-08-25

Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify coding regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.
Genome-Wide Analysis of Secondary Metabolite Gene Clusters in Ophiostoma ulmi and Ophiostoma novo-ulmi Reveals a Fujikurin-Like Gene Cluster with a Putative Role in Infection

Directory of Open Access Journals (Sweden)

Nicolau Sbaraini

2017-06-01

Full Text Available The emergence of new microbial pathogens can result in destructive outbreaks, since their hosts have limited resistance and pathogens may be excessively aggressive. Described as the major ecological incident of the twentieth century, Dutch elm disease, caused by ascomycete fungi from the Ophiostoma genus, has caused a significant decline in elm tree populations (Ulmus sp. in North America and Europe. Genome sequencing of the two main causative agents of Dutch elm disease (Ophiostoma ulmi and Ophiostoma novo-ulmi, along with closely related species with different lifestyles, allows for unique comparisons to be made to identify how pathogens and virulence determinants have emerged. Among several established virulence determinants, secondary metabolites (SMs have been suggested to play significant roles during phytopathogen infection. Interestingly, the secondary metabolism of Dutch elm pathogens remains almost unexplored, and little is known about how SM biosynthetic genes are organized in these species. To better understand the metabolic potential of O. ulmi and O. novo-ulmi, we performed a deep survey and description of SM biosynthetic gene clusters (BGCs in these species and assessed their conservation among eight species from the Ophiostomataceae family. Among 19 identified BGCs, a fujikurin-like gene cluster (OpPKS8 was unique to Dutch elm pathogens. Phylogenetic analysis revealed that orthologs for this gene cluster are widespread among phytopathogens and plant-associated fungi, suggesting that OpPKS8 may have been horizontally acquired by the Ophiostoma genus. Moreover, the detailed identification of several BGCs paves the way for future in-depth research and supports the potential impact of secondary metabolism on Ophiostoma genus’ lifestyle.
The fusion protein signal-peptide-coding region of canine distemper virus: a useful tool for phylogenetic reconstruction and lineage identification.

Directory of Open Access Journals (Sweden)

Nicolás Sarute

Full Text Available Canine distemper virus (CDV; Paramyxoviridae, Morbillivirus is the etiologic agent of a multisystemic infectious disease affecting all terrestrial carnivore families with high incidence and mortality in domestic dogs. Sequence analysis of the hemagglutinin (H gene has been widely employed to characterize field strains, permitting the identification of nine CDV lineages worldwide. Recently, it has been established that the sequences of the fusion protein signal-peptide (Fsp coding region are extremely variable, suggesting that analysis of its sequence might be useful for strain characterization studies. However, the divergence of Fsp sequences among worldwide strains and its phylogenetic resolution has not yet been evaluated. We constructed datasets containing the Fsp-coding region and H gene sequences of the same strains belonging to eight CDV lineages. Both datasets were used to evaluate their phylogenetic resolution. The phylogenetic analysis revealed that both datasets clustered the same strains into eight different branches, corresponding to CDV lineages. The inter-lineage amino acid divergence was fourfold greater for the Fsp peptide than for the H protein. The likelihood mapping revealed that both datasets display strong phylogenetic signals in the region of well-resolved topologies. These features indicate that Fsp-coding region sequence analysis is suitable for evolutionary studies as it allows for straightforward identification of CDV lineages.
The fusion protein signal-peptide-coding region of canine distemper virus: a useful tool for phylogenetic reconstruction and lineage identification.

Science.gov (United States)

Sarute, Nicolás; Calderón, Marina Gallo; Pérez, Ruben; La Torre, José; Hernández, Martín; Francia, Lourdes; Panzera, Yanina

2013-01-01

Canine distemper virus (CDV; Paramyxoviridae, Morbillivirus) is the etiologic agent of a multisystemic infectious disease affecting all terrestrial carnivore families with high incidence and mortality in domestic dogs. Sequence analysis of the hemagglutinin (H) gene has been widely employed to characterize field strains, permitting the identification of nine CDV lineages worldwide. Recently, it has been established that the sequences of the fusion protein signal-peptide (Fsp) coding region are extremely variable, suggesting that analysis of its sequence might be useful for strain characterization studies. However, the divergence of Fsp sequences among worldwide strains and its phylogenetic resolution has not yet been evaluated. We constructed datasets containing the Fsp-coding region and H gene sequences of the same strains belonging to eight CDV lineages. Both datasets were used to evaluate their phylogenetic resolution. The phylogenetic analysis revealed that both datasets clustered the same strains into eight different branches, corresponding to CDV lineages. The inter-lineage amino acid divergence was fourfold greater for the Fsp peptide than for the H protein. The likelihood mapping revealed that both datasets display strong phylogenetic signals in the region of well-resolved topologies. These features indicate that Fsp-coding region sequence analysis is suitable for evolutionary studies as it allows for straightforward identification of CDV lineages.
De novo transcriptome characterization and gene expression profiling of the desiccation tolerant moss Bryum argenteum following rehydration.

Science.gov (United States)

Gao, Bei; Zhang, Daoyuan; Li, Xiaoshuang; Yang, Honglan; Zhang, Yuanming; Wood, Andrew J

2015-05-28

The desiccation-tolerant moss Bryum argenteum is an important component of the Biological Soil Crusts (BSCs) found in the Gurbantunggut desert. Desiccation tolerance is defined as the ability to revive from the air dried state. To elucidate the molecular mechanisms related to desiccation tolerance, we employed RNA-Seq and digital gene expression (DGE) technologies to study the genome-wide expression profiles of the dehydration and rehydration processes in this important desert plant. We applied a two-step approach to investigate the gene expression profile upon rehydration in the moss Bryum argenteum using Illumina HiSeq2000 sequencing platform. First, a total of 57,247 transcript assembly contigs (TACs) were obtained from 54.79 million reads by de novo assembly, with an average length of 863 bp and N50 of 1,372 bp. Among the reconstructed TACs, 36,916 (64.5%) revealed similarity with existing protein sequences in the public databases. 23,509 and 21,607 TACs were assigned GO and KEGG annotation information, respectively. Second, samples were taken from 3 hydration stages: desiccated (Dry), rehydrated 2 h (R2) and rehydrated 24 h (R24), and DEG libraries were constructed for Differentially Expressed Genes (DEGs) discovery. 4,081 and 6,709 DEGs were identified in R2 and R24, compared with Dry, respectively. Compared to the desiccated sample, up-regulated genes after two hours of hydration are primarily related to stress responses. GO function enrichment network, EKGG metabolic pathway and MapMan analysis supports the idea of the rapid recovery of photosynthesis after 24 h of rehydration. We identified 770 transcription factors (TFs) which were classified into 50 TF families. 142 TF transcripts were up-regulated upon rehydration including 23 members of the ERF family. In this study, we constructed a pioneering, high-quality reference transcriptome in B. argenteum and generated three DGE libraries to elucidate the changes of gene expression upon rehydration. Expression
IGF-1 modulates gene expression of proteins involved in inflammation, cytoskeleton, and liver architecture.

Science.gov (United States)

Lara-Diaz, V J; Castilla-Cortazar, I; Martín-Estal, I; García-Magariño, M; Aguirre, G A; Puche, J E; de la Garza, R G; Morales, L A; Muñoz, U

2017-05-01

Even though the liver synthesizes most of circulating IGF-1, it lacks its receptor under physiological conditions. However, according to previous studies, a damaged liver expresses the receptor. For this reason, herein, we examine hepatic histology and expression of genes encoding proteins of the cytoskeleton, extracellular matrix, and cell-cell molecules and inflammation-related proteins. A partial IGF-1 deficiency murine model was used to investigate IGF-1's effects on liver by comparing wild-type controls, heterozygous igf1 +/- , and heterozygous mice treated with IGF-1 for 10 days. Histology, microarray for mRNA gene expression, RT-qPCR, and lipid peroxidation were assessed. Microarray analyses revealed significant underexpression of igf1 in heterozygous mice compared to control mice, restoring normal liver expression after treatment, which then normalized its circulating levels. IGF-1 receptor mRNA was overexpressed in Hz mice liver, while treated mice displayed a similar expression to that of the controls. Heterozygous mice showed overexpression of several genes encoding proteins related to inflammatory and acute-phase proteins and underexpression or overexpression of genes which coded for extracellular matrix, cytoskeleton, and cell junction components. Histology revealed an altered hepatic architecture. In addition, liver oxidative damage was found increased in the heterozygous group. The mere IGF-1 partial deficiency is associated with relevant alterations of the hepatic architecture and expression of genes involved in cytoskeleton, hepatocyte polarity, cell junctions, and extracellular matrix proteins. Moreover, it induces hepatic expression of the IGF-1 receptor and elevated acute-phase and inflammation mediators, which all resulted in liver oxidative damage.
Saccharomyces cerevisiae ribosomal protein L37 is encoded by duplicate genes that are differentially expressed.

Science.gov (United States)

Tornow, J; Santangelo, G M

1994-06-01

A duplicate copy of the RPL37A gene (encoding ribosomal protein L37) was cloned and sequenced. The coding region of RPL37B is very similar to that of RPL37A, with only one conservative amino-acid difference. However, the intron and flanking sequences of the two genes are extremely dissimilar. Disruption experiments indicate that the two loci are not functionally equivalent: disruption of RPL37B was insignificant, but disruption of RPL37A severely impaired the growth rate of the cell. When both RPL37 loci are disrupted, the cell is unable to grow at all, indicating that rpL37 is an essential protein. The functional disparity between the two RPL37 loci could be explained by differential gene expression. The results of two experiments support this idea: gene fusion of RPL37A to a reporter gene resulted in six-fold higher mRNA levels than was generated by the same reporter gene fused to RPL37B, and a modest increase in gene dosage of RPL37B overcame the lack of a functional RPL37A gene.
Arsenic trioxide (AT) is a novel human neutrophil pro-apoptotic agent: effects of catalase on AT-induced apoptosis, degradation of cytoskeletal proteins and de novo protein synthesis.

Science.gov (United States)

Binet, François; Cavalli, Hélène; Moisan, Eliane; Girard, Denis

2006-02-01

The anti-cancer drug arsenic trioxide (AT) induces apoptosis in a variety of transformed or proliferating cells. However, little is known regarding its ability to induce apoptosis in terminally differentiated cells, such as neutrophils. Because neutropenia has been reported in some cancer patients after AT treatment, we hypothesised that AT could induce neutrophil apoptosis, an issue that has never been investigated. Herein, we found that AT-induced neutrophil apoptosis and gelsolin degradation via caspases. AT did not increase neutrophil superoxide production and did not induce mitochondrial generation of reactive oxygen species. AT-induced apoptosis in PLB-985 and X-linked chronic granulomatous disease (CGD) cells (PLB-985 cells deficient in gp91(phox) mimicking CGD) at the same potency. Addition of catalase, an inhibitor of H2O2, reversed AT-induced apoptosis and degradation of the cytoskeletal proteins gelsolin, alpha-tubulin and lamin B1. Unexpectedly, AT-induced de novo protein synthesis, which was reversed by catalase. Cycloheximide partially reversed AT-induced apoptosis. We conclude that AT induces neutrophil apoptosis by a caspase-dependent mechanism and via de novo protein synthesis. H2O2 is of major importance in AT-induced neutrophil apoptosis but its production does not originate from nicotinamide adenine dinucleotide phosphate dehydrogenase activation and mitochondria. Cytoskeletal structures other than microtubules can now be considered as novel targets of AT.
De novo assembly of a haplotype-resolved human genome.

Science.gov (United States)

Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

2015-06-01

The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.
De novo transcriptome assembly and analysis of differential gene expression in response to drought in European beech.

Directory of Open Access Journals (Sweden)

Markus Müller

Full Text Available Despite the ecological and economic importance of European beech (Fagus sylvatica L. genomic resources of this species are still limited. This hampers an understanding of the molecular basis of adaptation to stress. Since beech will most likely be threatened by the consequences of climate change, an understanding of adaptive processes to climate change-related drought stress is of major importance. Here, we used RNA-seq to provide the first drought stress-related transcriptome of beech. In a drought stress trial with beech saplings, 50 samples were taken for RNA extraction at five points in time during a soil desiccation experiment. De novo transcriptome assembly and analysis of differential gene expression revealed 44,335 contigs, and 662 differentially expressed genes between the stress and normally watered control group. Gene expression was specific to the different time points, and only five genes were significantly differentially expressed between the stress and control group on all five sampling days. GO term enrichment showed that mostly genes involved in lipid- and homeostasis-related processes were upregulated, whereas genes involved in oxidative stress response were downregulated in the stressed seedlings. This study gives first insights into the genomic drought stress response of European beech, and provides new genetic resources for adaptation research in this species.
Evolution of hepatic glucose metabolism: liver-specific glucokinase deficiency explained by parallel loss of the gene for glucokinase regulatory protein (GCKR.

Directory of Open Access Journals (Sweden)

Zhao Yang Wang

Full Text Available Glucokinase (GCK plays an important role in the regulation of carbohydrate metabolism. In the liver, phosphorylation of glucose to glucose-6-phosphate by GCK is the first step for both glycolysis and glycogen synthesis. However, some vertebrate species are deficient in GCK activity in the liver, despite containing GCK genes that appear to be compatible with function in their genomes. Glucokinase regulatory protein (GCKR is the most important post-transcriptional regulator of GCK in the liver; it participates in the modulation of GCK activity and location depending upon changes in glucose levels. In experimental models, loss of GCKR has been shown to associate with reduced hepatic GCK protein levels and activity.GCKR genes and GCKR-like sequences were identified in the genomes of all vertebrate species with available genome sequences. The coding sequences of GCKR and GCKR-like genes were identified and aligned; base changes likely to disrupt coding potential or splicing were also identified.GCKR genes could not be found in the genomes of 9 vertebrate species, including all birds. In addition, in multiple mammalian genomes, whereas GCKR-like gene sequences could be identified, these genes could not predict a functional protein. Vertebrate species that were previously reported to be deficient in hepatic GCK activity were found to have deleted (birds and lizard or mutated (mammals GCKR genes. Our results suggest that mutation of the GCKR gene leads to hepatic GCK deficiency due to the loss of the stabilizing effect of GCKR.
Combining random gene fission and rational gene fusion to discover near-infrared fluorescent protein fragments that report on protein-protein interactions.

Science.gov (United States)

Pandey, Naresh; Nobles, Christopher L; Zechiedrich, Lynn; Maresso, Anthony W; Silberg, Jonathan J

2015-05-15

Gene fission can convert monomeric proteins into two-piece catalysts, reporters, and transcription factors for systems and synthetic biology. However, some proteins can be challenging to fragment without disrupting function, such as near-infrared fluorescent protein (IFP). We describe a directed evolution strategy that can overcome this challenge by randomly fragmenting proteins and concomitantly fusing the protein fragments to pairs of proteins or peptides that associate. We used this method to create libraries that express fragmented IFP as fusions to a pair of associating peptides (IAAL-E3 and IAAL-K3) and proteins (CheA and CheY) and screened for fragmented IFP with detectable near-infrared fluorescence. Thirteen novel fragmented IFPs were identified, all of which arose from backbone fission proximal to the interdomain linker. Either the IAAL-E3 and IAAL-K3 peptides or CheA and CheY proteins could assist with IFP fragment complementation, although the IAAL-E3 and IAAL-K3 peptides consistently yielded higher fluorescence. These results demonstrate how random gene fission can be coupled to rational gene fusion to create libraries enriched in fragmented proteins with AND gate logic that is dependent upon a protein-protein interaction, and they suggest that these near-infrared fluorescent protein fragments will be suitable as reporters for pairs of promoters and protein-protein interactions within whole animals.
FunGene: the functional gene pipeline and repository.

Science.gov (United States)

Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

2013-01-01

Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
FunGene: the Functional Gene Pipeline and Repository

Directory of Open Access Journals (Sweden)

Jordan A. Fish

2013-10-01

Full Text Available Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer.While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/ offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
Disruption of the Acyl-CoA binding protein gene delays hepatic adaptation to metabolic changes at weaning

DEFF Research Database (Denmark)

Neess, Ditte; Marcher, Ann-Britt; Bloksgaard, Maria

The acyl-CoA binding protein/diazepam binding inhibitor (ACBP/DBI) is an evolutionary conserved intracellular protein that binds C14-C22 acyl-CoA esters with very high affinity. ACBP is thought to act as an acyl-CoA transporter, and in vitro analyses have indicated that ACBP can transport acyl......-CoA esters between different enzymatic systems. However, little is known about the in vivo function in mammalian cells. We have generated mice with targeted disruption of ACBP (ACBP-/-). These mice are viable and fertile and develop normally. However, around weaning the ACBP-/- mice show decreased growth......) family, around the weaning period. As a result, the hepatic de novo cholesterogenesis is significantly decreased at weaning. The delayed induction of SREBP target genes around weaning is caused by a compromised processing and decreased expression of SREBP precursors leading to reduced binding of SREBP...
Characterization of the ovine ribosomal protein SA gene and its pseudogenes

Directory of Open Access Journals (Sweden)

Van Zeveren Alex

2010-03-01

Full Text Available Abstract Background The ribosomal protein SA (RPSA, previously named 37-kDa laminin receptor precursor/67-kDa laminin receptor (LRP/LR is a multifunctional protein that plays a role in a number of pathological processes, such as cancer and prion diseases. In all investigated species, RPSA is a member of a multicopy gene family consisting of one full length functional gene and several pseudogenes. Therefore, for studies on RPSA related pathways/pathologies, it is important to characterize the whole family and to address the possible function of the other RPSA family members. The present work aims at deciphering the RPSA family in sheep. Results In addition to the full length functional ovine RPSA gene, 11 other members of this multicopy gene family, all processed pseudogenes, were identified. Comparison between the RPSA transcript and these pseudogenes shows a large variety in sequence identities ranging from 99% to 74%. Only one of the 11 pseudogenes, i.e. RPSAP7, shares the same open reading frame (ORF of 295 amino acids with the RPSA gene, differing in only one amino acid. All members of the RPSA family were annotated by comparative mapping and fluorescence in situ hybridization (FISH localization. Transcription was investigated in the cerebrum, cerebellum, spleen, muscle, lymph node, duodenum and blood, and transcripts were detected for 6 of the 11 pseudogenes in some of these tissues. Conclusions In the present work we have characterized the ovine RPSA family. Our results have revealed the existence of 11 ovine RPSA pseudogenes and provide new data on their structure and sequence. Such information will facilitate molecular studies of the functional RPSA gene taking into account the existence of these pseudogenes in the design of experiments. It remains to be investigated if the transcribed members are functional as regulatory non-coding RNA or as functional proteins.
PROTEOLYTIC REMOVAL OF THE CARBOXYL TERMINUS OF THE T4 GENE 32 HELIX-DESTABILIZING PROTEIN ALTERS THE T4 IN VITRO REPLICATION COMPLEX

Energy Technology Data Exchange (ETDEWEB)

Burke, R.L.; Alberts, B.M.; Hosoda, J.

1980-07-01

The proteolytic removal of about 60 amino acids from the COOH terminus of the bacteriophage T4 helix-destabilizing protein (gene 32 protein) produces 32*I, a 27,000-dalton fragment which still binds tightly and cooperatively to single-stranded DNA. The substitution of 32*I protein for intact 32 protein in the seven-protein T4 replication complex results in dramatic changes in some of the reactions catalyzed by this in vitro DNA replication system, while leaving others largely unperturbed. (1) Like intact 32 protein, the 32*I protein promotes DNA synthesis by the DNA polymerase when the T4 polymerase accessory proteins (gene 44/62 and 45 proteins) are also present. The host helix-destabilizing protein (Escherichia coli ssb protein) cannot replace the 32*I protein for this synthesis. (2) Unlike intact 32 protein, 32*I protein strongly inhibits DNA synthesis catalyzed by the T4 DNA polymerase alone on a primed single-stranded DNA template. (3) Unlike intact 32 protein, the 32*I protein strongly inhibits RNA primer synthesis catalyzed by the T4 gene 41 and 61 proteins and also reduces the efficiency of RNA primer utilization. As a result, de novo DNA chain starts are blocked completely in the complete T4 replication system, and no lagging strand DNA synthesis occurs. (4) The 32*I protein does not bind to either the T4 DNA polymerase or to the T4 gene 61 protein in the absence of DNA; these associations (detected with intact 32 protein) would therefore appear to be essential for the normal control of 32 protein activity, and to account at least in part for observations 2 and 3, above. We propose that the COOH-terminal domain of intact 32 protein functions to guide its interactions with the T4 DNA polymerase and the T4 gene 61 RNA-priming protein. When this domain is removed, as in 32*I protein, the helix destabilization induced by the protein is controlled inadequately, so that polymerizing enzymes tend to be displaced from the growing 3{prime}-OH end of a
De novo assembly of leaf transcriptome in the medicinal plant Andrographis paniculata

Directory of Open Access Journals (Sweden)

Neeraja Cherukupalli

2016-08-01

Full Text Available Andrographis paniculata is an important medicinal plant containing various bioactive terpenoids and flavonoids. Despite its importance in herbal medicine, no ready-to-use transcript sequence information of this plant is made available in the public data base, this study mainly deals with the sequencing of RNA from A. paniculata leaf using Illumina HiSeqTM 2000 platform followed by the de novo transcriptome assembly. A total of 189.22 million high quality paired reads were generated and 1,70,724 transcripts were predicted in the primary assembly. Secondary assembly generated a transcriptome size of ~88 Mb with 83,800 clustered transcripts. Based on the similarity searches against plant nonredundant protein database, gene ontology and eukaryotic orthologous groups, 49,363 transcripts were annotated constituting upto 58.91% of the identified unigenes. Annotation of transcripts − using kyoto encyclopedia of genes and genomes database − revealed 5,606 transcripts plausibly involved in 140 pathways including biosynthesis of terpenoids and other secondary metabolites. Transcription factor analysis showed 6,767 unique transcripts belonging to 97 different transcription factor families. A total number of 124 CYP450 transcripts belonging to seven divergent clans have been identified. Transcriptome revealed 146 different transcripts coding for enzymes involved in the biosynthesis of terpenoids of which 35 contained terpene synthase motifs. This study also revealed 32,341 simple sequence repeats (SSRs in 23,168 transcripts. Assembled sequences of transcriptome of A.paniculata generated in this study are made available, for the first time, in the TSA database, which provides useful information for functional and comparative genomic analyses besides identification of key enzymes involved in the various pathways of secondary metabolism.

De novo Assembly of Leaf Transcriptome in the Medicinal Plant Andrographis paniculata

Science.gov (United States)

Cherukupalli, Neeraja; Divate, Mayur; Mittapelli, Suresh R.; Khareedu, Venkateswara R.; Vudem, Dashavantha R.

2016-01-01

Andrographis paniculata is an important medicinal plant containing various bioactive terpenoids and flavonoids. Despite its importance in herbal medicine, no ready-to-use transcript sequence information of this plant is made available in the public data base, this study mainly deals with the sequencing of RNA from A. paniculata leaf using Illumina HiSeq™ 2000 platform followed by the de novo transcriptome assembly. A total of 189.22 million high quality paired reads were generated and 1,70,724 transcripts were predicted in the primary assembly. Secondary assembly generated a transcriptome size of ~88 Mb with 83,800 clustered transcripts. Based on the similarity searches against plant non-redundant protein database, gene ontology, and eukaryotic orthologous groups, 49,363 transcripts were annotated constituting upto 58.91% of the identified unigenes. Annotation of transcripts—using kyoto encyclopedia of genes and genomes database—revealed 5606 transcripts plausibly involved in 140 pathways including biosynthesis of terpenoids and other secondary metabolites. Transcription factor analysis showed 6767 unique transcripts belonging to 97 different transcription factor families. A total number of 124 CYP450 transcripts belonging to seven divergent clans have been identified. Transcriptome revealed 146 different transcripts coding for enzymes involved in the biosynthesis of terpenoids of which 35 contained terpene synthase motifs. This study also revealed 32,341 simple sequence repeats (SSRs) in 23,168 transcripts. Assembled sequences of transcriptome of A. paniculata generated in this study are made available, for the first time, in the TSA database, which provides useful information for functional and comparative genomic analysis besides identification of key enzymes involved in the various pathways of secondary metabolism. PMID:27582746
Expression profile of genes coding for carotenoid biosynthetic ...

Indian Academy of Sciences (India)

Expression profile of genes coding for carotenoid biosynthetic pathway during ripening and their association with accumulation of lycopene in tomato fruits. Shuchi Smita, Ravi Rajwanshi, Sangram Keshari Lenka, Amit Katiyar, Viswanathan Chinnusamy and. Kailash Chander Bansal. J. Genet. 92, 363–368. Table 1.
Characterization of the beta amyloid precursor protein-like gene in the central nervous system of the crab Chasmagnathus. Expression during memory consolidation.

Science.gov (United States)

Fustiñana, Maria Sol; Ariel, Pablo; Federman, Noel; Freudenthal, Ramiro; Romano, Arturo

2010-09-01

Human β-amyloid, the main component in the neuritic plaques found in patients with Alzheimer's disease, is generated by cleavage of the β-amyloid precursor protein. Beyond the role in pathology, members of this protein family are synaptic proteins and have been associated with synaptogenesis, neuronal plasticity and memory, both in vertebrates and in invertebrates. Consolidation is necessary to convert a short-term labile memory to a long-term and stable form. During consolidation, gene expression and de novo protein synthesis are regulated in order to produce key proteins for the maintenance of plastic changes produced during the acquisition of new information. Here we partially cloned and sequenced the beta-amyloid precursor protein like gene homologue in the crab Chasmagnathus (cappl), showing a 37% of identity with the fruit fly Drosophila melanogaster homologue and 23% with Homo sapiens but with much higher degree of sequence similarity in certain regions. We observed a wide distribution of cappl mRNA in the nervous system as well as in muscle and gills. The protein localized in all tissues analyzed with the exception of muscle. Immunofluorescence revealed localization of cAPPL in associative and sensory brain areas. We studied gene and protein expression during long-term memory consolidation using a well characterized memory model: the context-signal associative memory in this crab species. mRNA levels varied at different time points during long-term memory consolidation and correlated with cAPPL protein levels cAPPL mRNA and protein is widely distributed in the central nervous system of the crab and the time course of expression suggests a role of cAPPL during long-term memory formation.
Automatically identifying gene/protein terms in MEDLINE abstracts.

Science.gov (United States)

Yu, Hong; Hatzivassiloglou, Vasileios; Rzhetsky, Andrey; Wilbur, W John

2002-01-01

Natural language processing (NLP) techniques are used to extract information automatically from computer-readable literature. In biology, the identification of terms corresponding to biological substances (e.g., genes and proteins) is a necessary step that precedes the application of other NLP systems that extract biological information (e.g., protein-protein interactions, gene regulation events, and biochemical pathways). We have developed GPmarkup (for "gene/protein-full name mark up"), a software system that automatically identifies gene/protein terms (i.e., symbols or full names) in MEDLINE abstracts. As a part of marking up process, we also generated automatically a knowledge source of paired gene/protein symbols and full names (e.g., LARD for lymphocyte associated receptor of death) from MEDLINE. We found that many of the pairs in our knowledge source do not appear in the current GenBank database. Therefore our methods may also be used for automatic lexicon generation. GPmarkup has 73% recall and 93% precision in identifying and marking up gene/protein terms in MEDLINE abstracts. A random sample of gene/protein symbols and full names and a sample set of marked up abstracts can be viewed at http://www.cpmc.columbia.edu/homepages/yuh9001/GPmarkup/. Contact. hy52@columbia.edu. Voice: 212-939-7028; fax: 212-666-0140.
De novo mutations in ATP1A3 cause alternating hemiplegia of childhood

Science.gov (United States)

Heinzen, Erin L.; Swoboda, Kathryn J.; Hitomi, Yuki; Gurrieri, Fiorella; Nicole, Sophie; de Vries, Boukje; Tiziano, F. Danilo; Fontaine, Bertrand; Walley, Nicole M.; Heavin, Sinéad; Panagiotakaki, Eleni; Fiori, Stefania; Abiusi, Emanuela; Di Pietro, Lorena; Sweney, Matthew T.; Newcomb, Tara M.; Viollet, Louis; Huff, Chad; Jorde, Lynn B.; Reyna, Sandra P.; Murphy, Kelley J.; Shianna, Kevin V.; Gumbs, Curtis E.; Little, Latasha; Silver, Kenneth; Ptác̆ek, Louis J.; Haan, Joost; Ferrari, Michel D.; Bye, Ann M.; Herkes, Geoffrey K.; Whitelaw, Charlotte M.; Webb, David; Lynch, Bryan J.; Uldall, Peter; King, Mary D.; Scheffer, Ingrid E.; Neri, Giovanni; Arzimanoglou, Alexis; van den Maagdenberg, Arn M.J.M.; Sisodiya, Sanjay M.; Mikati, Mohamad A.; Goldstein, David B.; Nicole, Sophie; Gurrieri, Fiorella; Neri, Giovanni; de Vries, Boukje; Koelewijn, Stephany; Kamphorst, Jessica; Geilenkirchen, Marije; Pelzer, Nadine; Laan, Laura; Haan, Joost; Ferrari, Michel; van den Maagdenberg, Arn; Zucca, Claudio; Bassi, Maria Teresa; Franchini, Filippo; Vavassori, Rosaria; Giannotta, Melania; Gobbi, Giuseppe; Granata, Tiziana; Nardocci, Nardo; De Grandis, Elisa; Veneselli, Edvige; Stagnaro, Michela; Gurrieri, Fiorella; Neri, Giovanni; Vigevano, Federico; Panagiotakaki, Eleni; Oechsler, Claudia; Arzimanoglou, Alexis; Nicole, Sophie; Giannotta, Melania; Gobbi, Giuseppe; Ninan, Miriam; Neville, Brian; Ebinger, Friedrich; Fons, Carmen; Campistol, Jaume; Kemlink, David; Nevsimalova, Sona; Laan, Laura; Peeters-Scholte, Cacha; van den Maagdenberg, Arn; Casaer, Paul; Casari, Giorgio; Sange, Guenter; Spiel, Georg; Boneschi, Filippo Martinelli; Zucca, Claudio; Bassi, Maria Teresa; Schyns, Tsveta; Crawley, Francis; Poncelin, Dominique; Vavassori, Rosaria

2012-01-01

Alternating hemiplegia of childhood (AHC) is a rare, severe neurodevelopmental syndrome characterized by recurrent hemiplegic episodes and distinct neurologic manifestations. AHC is usually a sporadic disorder with unknown etiology. Using exome sequencing of seven patients with AHC, and their unaffected parents, we identified de novo nonsynonymous mutations in ATP1A3 in all seven AHC patients. Subsequent sequence analysis of ATP1A3 in 98 additional patients revealed that 78% of AHC cases have a likely causal ATP1A3 mutation, including one inherited mutation in a familial case of AHC. Remarkably, six ATP1A3 mutations explain the majority of patients, including one observed in 36 patients. Unlike ATP1A3 mutations that cause rapid-onset-dystonia-parkinsonism, AHC-causing mutations revealed consistent reductions in ATPase activity without effects on protein expression. This work identifies de novo ATP1A3 mutations as the primary cause of AHC, and offers insight into disease pathophysiology by expanding the spectrum of phenotypes associated with mutations in this gene. PMID:22842232
The VirA protein of Agrobacterium tumefaciens is autophosphorylated and is essential for vir gene regulation

International Nuclear Information System (INIS)

Jin, S.; Roitsch, T.; Ankenbauer, R.G.; Gordon, M.P.; Nester, E.W.

1990-01-01

The virA and virG gene products are required for the regulation of the vir regulon on the tumor-inducing (Ti) plasmid of Agrobacterium tumefaciens. VirA is a membrane-associated protein which is homologous to the sensor molecules of other two-component regulatory systems. The authors overproduced truncated VirA proteins in Escherichia coli be deleting different lengths of the 5'-coding region of the virA gene and placing these genes under lacZ control. These proteins were purified from polyacrylamide gels and renatured. The renatured proteins became radiolabeled when they were incubated with [γ- 32 P]ATP but not with [γ- 32 P]GTP or [α- 32 P]ATP, which suggests an ATP γ-phosphate-specific autophosphorylation. The smallest VirA protein, which retained only the C-terminal half of the protein, gave the strongest autophosphorylation signal, which demonstrates that the C-terminal domain has the autophosphorylation site. The phosphorylated amino acid was identified as phosphohistidine, and a highly conserved histidine was found in all of the VirA homologs. When this histidine was changed to glutamine, which cannot be phosphorylated, the resulting VirA protein lost both its ability to autophosphorylate and its biological function as a vir gene regulator. Results of this study indicate that VirA autophosphorylation is required for the induction of the vir regulon and subsequent tumor induction on plants by A. tumefaciens
XX male sex reversal with genital abnormalities associated with a de novo SOX3 gene duplication.

Science.gov (United States)

Moalem, Sharon; Babul-Hirji, Riyana; Stavropolous, Dmitri J; Wherrett, Diane; Bägli, Darius J; Thomas, Paul; Chitayat, David

2012-07-01

Differentiation of the bipotential gonad into testis is initiated by the Y chromosome-linked gene SRY (Sex-determining Region Y) through upregulation of its autosomal direct target gene SOX9 (Sry-related HMG box-containing gene 9). Sequence and chromosome homology studies have shown that SRY most probably evolved from SOX3, which in humans is located at Xq27.1. Mutations causing SOX3 loss-of-function do not affect the sex determination in mice or humans. However, transgenic mouse studies have shown that ectopic expression of Sox3 in the bipotential gonad results in upregulation of Sox9, resulting in testicular induction and XX male sex reversal. However, the mechanism by which these rearrangements cause sex reversal and the frequency with which they are associated with disorders of sex development remains unclear. Rearrangements of the SOX3 locus were identified recently in three cases of human XX male sex reversal. We report on a case of XX male sex reversal associated with a novel de novo duplication of the SOX3 gene. These data provide additional evidence that SOX3 gain-of-function in the XX bipotential gonad causes XX male sex reversal and further support the hypothesis that SOX3 is the evolutionary antecedent of SRY. Copyright © 2012 Wiley Periodicals, Inc.
Estradiol-Induced Transcriptional Regulation of Long Non-Coding RNA, HOTAIR.

Science.gov (United States)

Bhan, Arunoday; Mandal, Subhrangsu S

2016-01-01

HOTAIR (HOX antisense intergenic RNA) is a 2.2 kb long non-coding RNA (lncRNA), transcribed from the antisense strand of homeobox C (HOXC) gene locus in chromosome 12. HOTAIR acts as a scaffolding lncRNA. It interacts and guides various chromatin-modifying complexes such as PRC2 (polycomb-repressive complex 2) and LSD1 (lysine-specific demethylase 1) to the target gene promoters leading to their gene silencing. Various studies have demonstrated that HOTAIR overexpression is associated with breast cancer. Recent studies from our laboratory demonstrate that HOTAIR is required for viability of breast cancer cells and is transcriptionally regulated by estradiol (E2) in vitro and in vivo. This chapter describes protocols for analysis of the HOTAIR promoter, cloning, transfection and dual luciferase assays, knockdown of protein synthesis by antisense oligonucleotides, and chromatin immunoprecipitation (ChIP) assay. These protocols are useful for studying the estrogen-mediated transcriptional regulation of lncRNA HOTAIR, as well as other protein coding genes and non-coding RNAs.
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Science.gov (United States)

Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D

2018-01-04

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
WebScipio: An online tool for the determination of gene structures using protein sequences

Directory of Open Access Journals (Sweden)

Waack Stephan

2008-09-01

Full Text Available Abstract Background Obtaining the gene structure for a given protein encoding gene is an important step in many analyses. A software suited for this task should be readily accessible, accurate, easy to handle and should provide the user with a coherent representation of the most probable gene structure. It should be rigorous enough to optimise features on the level of single bases and at the same time flexible enough to allow for cross-species searches. Results WebScipio, a web interface to the Scipio software, allows a user to obtain the corresponding coding sequence structure of a here given a query protein sequence that belongs to an already assembled eukaryotic genome. The resulting gene structure is presented in various human readable formats like a schematic representation, and a detailed alignment of the query and the target sequence highlighting any discrepancies. WebScipio can also be used to identify and characterise the gene structures of homologs in related organisms. In addition, it offers a web service for integration with other programs. Conclusion WebScipio is a tool that allows users to get a high-quality gene structure prediction from a protein query. It offers more than 250 eukaryotic genomes that can be searched and produces predictions that are close to what can be achieved by manual annotation, for in-species and cross-species searches alike. WebScipio is freely accessible at http://www.webscipio.org.
On the role of the second coding exon of the HIV-1 Tat protein in virus replication and MHC class I downregulation

NARCIS (Netherlands)

Verhoef, K.; Bauer, M.; Meyerhans, A.; Berkhout, B.

1998-01-01

Tat is an essential protein of human immunodeficiency virus type 1 (HIV-1) and activates transcription from the viral long terminal repeat (LTR) promoter. The tat gene is composed of two coding exons of which the first, corresponding to the N-terminal 72 amino acid residues, has been reported to be
Discovering disease-associated genes in weighted protein-protein interaction networks

Science.gov (United States)

Cui, Ying; Cai, Meng; Stanley, H. Eugene

2018-04-01

Although there have been many network-based attempts to discover disease-associated genes, most of them have not taken edge weight - which quantifies their relative strength - into consideration. We use connection weights in a protein-protein interaction (PPI) network to locate disease-related genes. We analyze the topological properties of both weighted and unweighted PPI networks and design an improved random forest classifier to distinguish disease genes from non-disease genes. We use a cross-validation test to confirm that weighted networks are better able to discover disease-associated genes than unweighted networks, which indicates that including link weight in the analysis of network properties provides a better model of complex genotype-phenotype associations.
Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer's disease.

Science.gov (United States)

Sims, Rebecca; van der Lee, Sven J; Naj, Adam C; Bellenguez, Céline; Badarinarayan, Nandini; Jakobsdottir, Johanna; Kunkle, Brian W; Boland, Anne; Raybould, Rachel; Bis, Joshua C; Martin, Eden R; Grenier-Boley, Benjamin; Heilmann-Heimbach, Stefanie; Chouraki, Vincent; Kuzma, Amanda B; Sleegers, Kristel; Vronskaya, Maria; Ruiz, Agustin; Graham, Robert R; Olaso, Robert; Hoffmann, Per; Grove, Megan L; Vardarajan, Badri N; Hiltunen, Mikko; Nöthen, Markus M; White, Charles C; Hamilton-Nelson, Kara L; Epelbaum, Jacques; Maier, Wolfgang; Choi, Seung-Hoan; Beecham, Gary W; Dulary, Cécile; Herms, Stefan; Smith, Albert V; Funk, Cory C; Derbois, Céline; Forstner, Andreas J; Ahmad, Shahzad; Li, Hongdong; Bacq, Delphine; Harold, Denise; Satizabal, Claudia L; Valladares, Otto; Squassina, Alessio; Thomas, Rhodri; Brody, Jennifer A; Qu, Liming; Sánchez-Juan, Pascual; Morgan, Taniesha; Wolters, Frank J; Zhao, Yi; Garcia, Florentino Sanchez; Denning, Nicola; Fornage, Myriam; Malamon, John; Naranjo, Maria Candida Deniz; Majounie, Elisa; Mosley, Thomas H; Dombroski, Beth; Wallon, David; Lupton, Michelle K; Dupuis, Josée; Whitehead, Patrice; Fratiglioni, Laura; Medway, Christopher; Jian, Xueqiu; Mukherjee, Shubhabrata; Keller, Lina; Brown, Kristelle; Lin, Honghuang; Cantwell, Laura B; Panza, Francesco; McGuinness, Bernadette; Moreno-Grau, Sonia; Burgess, Jeremy D; Solfrizzi, Vincenzo; Proitsi, Petra; Adams, Hieab H; Allen, Mariet; Seripa, Davide; Pastor, Pau; Cupples, L Adrienne; Price, Nathan D; Hannequin, Didier; Frank-García, Ana; Levy, Daniel; Chakrabarty, Paramita; Caffarra, Paolo; Giegling, Ina; Beiser, Alexa S; Giedraitis, Vilmantas; Hampel, Harald; Garcia, Melissa E; Wang, Xue; Lannfelt, Lars; Mecocci, Patrizia; Eiriksdottir, Gudny; Crane, Paul K; Pasquier, Florence; Boccardi, Virginia; Henández, Isabel; Barber, Robert C; Scherer, Martin; Tarraga, Lluis; Adams, Perrie M; Leber, Markus; Chen, Yuning; Albert, Marilyn S; Riedel-Heller, Steffi; Emilsson, Valur; Beekly, Duane; Braae, Anne; Schmidt, Reinhold; Blacker, Deborah; Masullo, Carlo; Schmidt, Helena; Doody, Rachelle S; Spalletta, Gianfranco; Longstreth, W T; Fairchild, Thomas J; Bossù, Paola; Lopez, Oscar L; Frosch, Matthew P; Sacchinelli, Eleonora; Ghetti, Bernardino; Yang, Qiong; Huebinger, Ryan M; Jessen, Frank; Li, Shuo; Kamboh, M Ilyas; Morris, John; Sotolongo-Grau, Oscar; Katz, Mindy J; Corcoran, Chris; Dunstan, Melanie; Braddel, Amy; Thomas, Charlene; Meggy, Alun; Marshall, Rachel; Gerrish, Amy; Chapman, Jade; Aguilar, Miquel; Taylor, Sarah; Hill, Matt; Fairén, Mònica Díez; Hodges, Angela; Vellas, Bruno; Soininen, Hilkka; Kloszewska, Iwona; Daniilidou, Makrina; Uphill, James; Patel, Yogen; Hughes, Joseph T; Lord, Jenny; Turton, James; Hartmann, Annette M; Cecchetti, Roberta; Fenoglio, Chiara; Serpente, Maria; Arcaro, Marina; Caltagirone, Carlo; Orfei, Maria Donata; Ciaramella, Antonio; Pichler, Sabrina; Mayhaus, Manuel; Gu, Wei; Lleó, Alberto; Fortea, Juan; Blesa, Rafael; Barber, Imelda S; Brookes, Keeley; Cupidi, Chiara; Maletta, Raffaele Giovanni; Carrell, David; Sorbi, Sandro; Moebus, Susanne; Urbano, Maria; Pilotto, Alberto; Kornhuber, Johannes; Bosco, Paolo; Todd, Stephen; Craig, David; Johnston, Janet; Gill, Michael; Lawlor, Brian; Lynch, Aoibhinn; Fox, Nick C; Hardy, John; Albin, Roger L; Apostolova, Liana G; Arnold, Steven E; Asthana, Sanjay; Atwood, Craig S; Baldwin, Clinton T; Barnes, Lisa L; Barral, Sandra; Beach, Thomas G; Becker, James T; Bigio, Eileen H; Bird, Thomas D; Boeve, Bradley F; Bowen, James D; Boxer, Adam; Burke, James R; Burns, Jeffrey M; Buxbaum, Joseph D; Cairns, Nigel J; Cao, Chuanhai; Carlson, Chris S; Carlsson, Cynthia M; Carney, Regina M; Carrasquillo, Minerva M; Carroll, Steven L; Diaz, Carolina Ceballos; Chui, Helena C; Clark, David G; Cribbs, David H; Crocco, Elizabeth A; DeCarli, Charles; Dick, Malcolm; Duara, Ranjan; Evans, Denis A; Faber, Kelley M; Fallon, Kenneth B; Fardo, David W; Farlow, Martin R; Ferris, Steven; Foroud, Tatiana M; Galasko, Douglas R; Gearing, Marla; Geschwind, Daniel H; Gilbert, John R; Graff-Radford, Neill R; Green, Robert C; Growdon, John H; Hamilton, Ronald L; Harrell, Lindy E; Honig, Lawrence S; Huentelman, Matthew J; Hulette, Christine M; Hyman, Bradley T; Jarvik, Gail P; Abner, Erin; Jin, Lee-Way; Jun, Gyungah; Karydas, Anna; Kaye, Jeffrey A; Kim, Ronald; Kowall, Neil W; Kramer, Joel H; LaFerla, Frank M; Lah, James J; Leverenz, James B; Levey, Allan I; Li, Ge; Lieberman, Andrew P; Lunetta, Kathryn L; Lyketsos, Constantine G; Marson, Daniel C; Martiniuk, Frank; Mash, Deborah C; Masliah, Eliezer; McCormick, Wayne C; McCurry, Susan M; McDavid, Andrew N; McKee, Ann C; Mesulam, Marsel; Miller, Bruce L; Miller, Carol A; Miller, Joshua W; Morris, John C; Murrell, Jill R; Myers, Amanda J; O'Bryant, Sid; Olichney, John M; Pankratz, Vernon S; Parisi, Joseph E; Paulson, Henry L; Perry, William; Peskind, Elaine; Pierce, Aimee; Poon, Wayne W; Potter, Huntington; Quinn, Joseph F; Raj, Ashok; Raskind, Murray; Reisberg, Barry; Reitz, Christiane; Ringman, John M; Roberson, Erik D; Rogaeva, Ekaterina; Rosen, Howard J; Rosenberg, Roger N; Sager, Mark A; Saykin, Andrew J; Schneider, Julie A; Schneider, Lon S; Seeley, William W; Smith, Amanda G; Sonnen, Joshua A; Spina, Salvatore; Stern, Robert A; Swerdlow, Russell H; Tanzi, Rudolph E; Thornton-Wells, Tricia A; Trojanowski, John Q; Troncoso, Juan C; Van Deerlin, Vivianna M; Van Eldik, Linda J; Vinters, Harry V; Vonsattel, Jean Paul; Weintraub, Sandra; Welsh-Bohmer, Kathleen A; Wilhelmsen, Kirk C; Williamson, Jennifer; Wingo, Thomas S; Woltjer, Randall L; Wright, Clinton B; Yu, Chang-En; Yu, Lei; Garzia, Fabienne; Golamaully, Feroze; Septier, Gislain; Engelborghs, Sebastien; Vandenberghe, Rik; De Deyn, Peter P; Fernadez, Carmen Muñoz; Benito, Yoland Aladro; Thonberg, Hakan; Forsell, Charlotte; Lilius, Lena; Kinhult-Stählbom, Anne; Kilander, Lena; Brundin, RoseMarie; Concari, Letizia; Helisalmi, Seppo; Koivisto, Anne Maria; Haapasalo, Annakaisa; Dermecourt, Vincent; Fievet, Nathalie; Hanon, Olivier; Dufouil, Carole; Brice, Alexis; Ritchie, Karen; Dubois, Bruno; Himali, Jayanadra J; Keene, C Dirk; Tschanz, JoAnn; Fitzpatrick, Annette L; Kukull, Walter A; Norton, Maria; Aspelund, Thor; Larson, Eric B; Munger, Ron; Rotter, Jerome I; Lipton, Richard B; Bullido, María J; Hofman, Albert; Montine, Thomas J; Coto, Eliecer; Boerwinkle, Eric; Petersen, Ronald C; Alvarez, Victoria; Rivadeneira, Fernando; Reiman, Eric M; Gallo, Maura; O'Donnell, Christopher J; Reisch, Joan S; Bruni, Amalia Cecilia; Royall, Donald R; Dichgans, Martin; Sano, Mary; Galimberti, Daniela; St George-Hyslop, Peter; Scarpini, Elio; Tsuang, Debby W; Mancuso, Michelangelo; Bonuccelli, Ubaldo; Winslow, Ashley R; Daniele, Antonio; Wu, Chuang-Kuo; Peters, Oliver; Nacmias, Benedetta; Riemenschneider, Matthias; Heun, Reinhard; Brayne, Carol; Rubinsztein, David C; Bras, Jose; Guerreiro, Rita; Al-Chalabi, Ammar; Shaw, Christopher E; Collinge, John; Mann, David; Tsolaki, Magda; Clarimón, Jordi; Sussams, Rebecca; Lovestone, Simon; O'Donovan, Michael C; Owen, Michael J; Behrens, Timothy W; Mead, Simon; Goate, Alison M; Uitterlinden, Andre G; Holmes, Clive; Cruchaga, Carlos; Ingelsson, Martin; Bennett, David A; Powell, John; Golde, Todd E; Graff, Caroline; De Jager, Philip L; Morgan, Kevin; Ertekin-Taner, Nilufer; Combarros, Onofre; Psaty, Bruce M; Passmore, Peter; Younkin, Steven G; Berr, Claudine; Gudnason, Vilmundur; Rujescu, Dan; Dickson, Dennis W; Dartigues, Jean-François; DeStefano, Anita L; Ortega-Cubero, Sara; Hakonarson, Hakon; Campion, Dominique; Boada, Merce; Kauwe, John Keoni; Farrer, Lindsay A; Van Broeckhoven, Christine; Ikram, M Arfan; Jones, Lesley; Haines, Jonathan L; Tzourio, Christophe; Launer, Lenore J; Escott-Price, Valentina; Mayeux, Richard; Deleuze, Jean-François; Amin, Najaf; Holmans, Peter A; Pericak-Vance, Margaret A; Amouyel, Philippe; van Duijn, Cornelia M; Ramirez, Alfredo; Wang, Li-San; Lambert, Jean-Charles; Seshadri, Sudha; Williams, Julie; Schellenberg, Gerard D

2017-09-01

We identified rare coding variants associated with Alzheimer's disease in a three-stage case-control study of 85,133 subjects. In stage 1, we genotyped 34,174 samples using a whole-exome microarray. In stage 2, we tested associated variants (P < 1 × 10 -4 ) in 35,962 independent samples using de novo genotyping and imputed genotypes. In stage 3, we used an additional 14,997 samples to test the most significant stage 2 associations (P < 5 × 10 -8 ) using imputed genotypes. We observed three new genome-wide significant nonsynonymous variants associated with Alzheimer's disease: a protective variant in PLCG2 (rs72824905: p.Pro522Arg, P = 5.38 × 10 -10 , odds ratio (OR) = 0.68, minor allele frequency (MAF) cases = 0.0059, MAF controls = 0.0093), a risk variant in ABI3 (rs616338: p.Ser209Phe, P = 4.56 × 10 -10 , OR = 1.43, MAF cases = 0.011, MAF controls = 0.008), and a new genome-wide significant variant in TREM2 (rs143332484: p.Arg62His, P = 1.55 × 10 -14 , OR = 1.67, MAF cases = 0.0143, MAF controls = 0.0089), a known susceptibility gene for Alzheimer's disease. These protein-altering changes are in genes highly expressed in microglia and highlight an immune-related protein-protein interaction network enriched for previously identified risk genes in Alzheimer's disease. These genetic findings provide additional evidence that the microglia-mediated innate immune response contributes directly to the development of Alzheimer's disease.
Defining the maize transcriptome de novo using deep RNA-Seq

Energy Technology Data Exchange (ETDEWEB)

Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong

2011-06-01

De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.
Defining the maize transcriptome de novo using deep RNA-Seq

Energy Technology Data Exchange (ETDEWEB)

Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong

2011-06-02

De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.
On the total number of genes and their length distribution in complete microbial genomes

DEFF Research Database (Denmark)

Skovgaard, M; Jensen, L J; Brunak, S

2001-01-01

In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribut......In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes....
A Bioinformatics Analysis Reveals a Group of MocR Bacterial Transcriptional Regulators Linked to a Family of Genes Coding for Membrane Proteins

Directory of Open Access Journals (Sweden)

Teresa Milano

2016-01-01

Full Text Available The MocR bacterial transcriptional regulators are characterized by an N-terminal domain, 60 residues long on average, possessing the winged-helix-turn-helix (wHTH architecture responsible for DNA recognition and binding, linked to a large C-terminal domain (350 residues on average that is homologous to fold type-I pyridoxal 5′-phosphate (PLP dependent enzymes like aspartate aminotransferase (AAT. These regulators are involved in the expression of genes taking part in several metabolic pathways directly or indirectly connected to PLP chemistry, many of which are still uncharacterized. A bioinformatics analysis is here reported that studied the features of a distinct group of MocR regulators predicted to be functionally linked to a family of homologous genes coding for integral membrane proteins of unknown function. This group occurs mainly in the Actinobacteria and Gammaproteobacteria phyla. An analysis of the multiple sequence alignments of their wHTH and AAT domains suggested the presence of specificity-determining positions (SDPs. Mapping of SDPs onto a homology model of the AAT domain hinted at possible structural/functional roles in effector recognition. Likewise, SDPs in wHTH domain suggested the basis of specificity of Transcription Factor Binding Site recognition. The results reported represent a framework for rational design of experiments and for bioinformatics analysis of other MocR subgroups.
Transposable element dynamics and PIWI regulation impacts lncRNA and gene expression diversity in Drosophila ovarian cell cultures.

Science.gov (United States)

Sytnikova, Yuliya A; Rahman, Reazur; Chirn, Gung-Wei; Clark, Josef P; Lau, Nelson C

2014-12-01

Piwi proteins and Piwi-interacting RNAs (piRNAs) repress transposable elements (TEs) from mobilizing in gonadal cells. To determine the spectrum of piRNA-regulated targets that may extend beyond TEs, we conducted a genome-wide survey for transcripts associated with PIWI and for transcripts affected by PIWI knockdown in Drosophila ovarian somatic sheet (OSS) cells, a follicle cell line expressing the Piwi pathway. Despite the immense sequence diversity among OSS cell piRNAs, our analysis indicates that TE transcripts are the major transcripts associated with and directly regulated by PIWI. However, several coding genes were indirectly regulated by PIWI via an adjacent de novo TE insertion that generated a nascent TE transcript. Interestingly, we noticed that PIWI-regulated genes in OSS cells greatly differed from genes affected in a related follicle cell culture, ovarian somatic cells (OSCs). Therefore, we characterized the distinct genomic TE insertions across four OSS and OSC lines and discovered dynamic TE landscapes in gonadal cultures that were defined by a subset of active TEs. Particular de novo TEs appeared to stimulate the expression of novel candidate long noncoding RNAs (lncRNAs) in a cell lineage-specific manner, and some of these TE-associated lncRNAs were associated with PIWI and overlapped PIWI-regulated genes. Our analyses of OSCs and OSS cells demonstrate that despite having a Piwi pathway to suppress endogenous mobile elements, gonadal cell TE landscapes can still dramatically change and create transcriptome diversity. © 2014 Sytnikova et al.; Published by Cold Spring Harbor Laboratory Press.
Interleukin 6 signaling regulates promyelocytic leukemia protein gene expression in human normal and cancer cells

Czech Academy of Sciences Publication Activity Database

Hubáčková, Soňa; Krejčíková, Kateřina; Bartek, Jiří; Hodný, Zdeněk

2012-01-01

Roč. 287, č. 32 (2012), s. 26702-26714 ISSN 0021-9258 R&D Projects: GA ČR GA204/08/1418 Grant - others:Novo Nordisk(DK) R153-A12997; EK(XE) 223575 Institutional support: RVO:68378050 Keywords : cancer tumor promoter * DNA-binding protein * protein phosphorylation * tyrosine protein kinase * interleukin-6 Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.651, year: 2012
Developmental programming of long non-coding RNAs during postnatal liver maturation in mice.

Directory of Open Access Journals (Sweden)

Lai Peng

Full Text Available The liver is a vital organ with critical functions in metabolism, protein synthesis, and immune defense. Most of the liver functions are not mature at birth and many changes happen during postnatal liver development. However, it is unclear what changes occur in liver after birth, at what developmental stages they occur, and how the developmental processes are regulated. Long non-coding RNAs (lncRNAs are involved in organ development and cell differentiation. Here, we analyzed the transcriptome of lncRNAs in mouse liver from perinatal (day -2 to adult (day 60 by RNA-Sequencing, with an attempt to understand the role of lncRNAs in liver maturation. We found around 15,000 genes expressed, including about 2,000 lncRNAs. Most lncRNAs were expressed at a lower level than coding RNAs. Both coding RNAs and lncRNAs displayed three major ontogenic patterns: enriched at neonatal, adolescent, or adult stages. Neighboring coding and non-coding RNAs showed the trend to exhibit highly correlated ontogenic expression patterns. Gene ontology (GO analysis revealed that some lncRNAs enriched at neonatal ages have their neighbor protein coding genes also enriched at neonatal ages and associated with cell proliferation, immune activation related processes, tissue organization pathways, and hematopoiesis; other lncRNAs enriched at adolescent ages have their neighbor protein coding genes associated with different metabolic processes. These data reveal significant functional transition during postnatal liver development and imply the potential importance of lncRNAs in liver maturation.

Voltammetry and In Situ Scanning Tunnelling Microscopy of De Novo Designed Heme Protein Monolayers on Au(111)-Electrode Surfaces

DEFF Research Database (Denmark)

Albrecht, Tim; Li, Wu; Haehnel, Wolfgang

2006-01-01

to the tunnelling current, apparently due to slow electron transfer kinetics. As a consequence, STM images of heme-containing and heme-free MOP-C did not reveal any notable differences in apparent height or physical extension. The apparent height of heme-containing MOP-C did not show any dependence on the substrate...... potential being varied around the redox potential of the protein. The mere presence of an accessible molecular energy level is not sufficient to result in detectable tunnelling current modulation. (c) 2006 Elsevier B.V. All rights reserved.......In the present work, we report the electrochemical characterization and in situ scanning tunnelling microscopy (STM) studies of monolayers of an artificial de novo designed heme protein MOP-C, covalently immobilized on modified Au(111) surfaces. The protein forms closely packed monolayers, which...
Assignment of the murine protein kinase gene DLK to chromosome 15 in the vicinity of the bt/Koa locus by genetic linkage analysis

Energy Technology Data Exchange (ETDEWEB)

Watanabe, Toshio; Yanagisawa, Masahiro; Matsubara, Nobumichi [Tokyo Univ. (Japan)] [and others

1997-03-01

We have cloned protein kinase genes from murine primordial germ cell-derived EG cells by a PCR-based strategy using degenerate primers corresponding to the conserved sequences in the catalytic domain of protein kinases. One of these clones, designated Gek2 (germ cell kinase 2), was used as a probe for screening of a mouse brain cDNA library and obtained clones contained an entire coding sequence. Comparison of the sequence of Gek2 with those in databases revealed that it was identical to a previously reported protein kinase gene, DLK. 8 refs., 1 fig.
Evolutionary mechanisms driving the evolution of a large polydnavirus gene family coding for protein tyrosine phosphatases

Directory of Open Access Journals (Sweden)

Serbielle Céline

2012-12-01

Full Text Available Abstract Background Gene duplications have been proposed to be the main mechanism involved in genome evolution and in acquisition of new functions. Polydnaviruses (PDVs, symbiotic viruses associated with parasitoid wasps, are ideal model systems to study mechanisms of gene duplications given that PDV genomes consist of virulence genes organized into multigene families. In these systems the viral genome is integrated in a wasp chromosome as a provirus and virus particles containing circular double-stranded DNA are injected into the parasitoids’ hosts and are essential for parasitism success. The viral virulence factors, organized in gene families, are required collectively to induce host immune suppression and developmental arrest. The gene family which encodes protein tyrosine phosphatases (PTPs has undergone spectacular expansion in several PDV genomes with up to 42 genes. Results Here, we present strong indications that PTP gene family expansion occurred via classical mechanisms: by duplication of large segments of the chromosomally integrated form of the virus sequences (segmental duplication, by tandem duplications within this form and by dispersed duplications. We also propose a novel duplication mechanism specific to PDVs that involves viral circle reintegration into the wasp genome. The PTP copies produced were shown to undergo conservative evolution along with episodes of adaptive evolution. In particular recently produced copies have undergone positive selection in sites most likely involved in defining substrate selectivity. Conclusion The results provide evidence about the dynamic nature of polydnavirus proviral genomes. Classical and PDV-specific duplication mechanisms have been involved in the production of new gene copies. Selection pressures associated with antagonistic interactions with parasitized hosts have shaped these genes used to manipulate lepidopteran physiology with evidence for positive selection involved in
Comprehensive search for intra- and inter-specific sequence polymorphisms among coding envelope genes of retroviral origin found in the human genome: genes and pseudogenes

Directory of Open Access Journals (Sweden)

Vasilescu Alexandre

2005-09-01

Full Text Available Abstract Background The human genome carries a high load of proviral-like sequences, called Human Endogenous Retroviruses (HERVs, which are the genomic traces of ancient infections by active retroviruses. These elements are in most cases defective, but open reading frames can still be found for the retroviral envelope gene, with sixteen such genes identified so far. Several of them are conserved during primate evolution, having possibly been co-opted by their host for a physiological role. Results To characterize further their status, we presently sequenced 12 of these genes from a panel of 91 Caucasian individuals. Genomic analyses reveal strong sequence conservation (only two non synonymous Single Nucleotide Polymorphisms [SNPs] for the two HERV-W and HERV-FRD envelope genes, i.e. for the two genes specifically expressed in the placenta and possibly involved in syncytiotrophoblast formation. We further show – using an ex vivo fusion assay for each allelic form – that none of these SNPs impairs the fusogenic function. The other envelope proteins disclose variable polymorphisms, with the occurrence of a stop codon and/or frameshift for most – but not all – of them. Moreover, the sequence conservation analysis of the orthologous genes that can be found in primates shows that three env genes have been maintained in a fully coding state throughout evolution including envW and envFRD. Conclusion Altogether, the present study strongly suggests that some but not all envelope encoding sequences are bona fide genes. It also provides new tools to elucidate the possible role of endogenous envelope proteins as susceptibility factors in a number of pathologies where HERVs have been suspected to be involved.
De novo and inherited private variants in MAP1B in periventricular nodular heterotopia.

Science.gov (United States)

Heinzen, Erin L; O'Neill, Adam C; Zhu, Xiaolin; Allen, Andrew S; Bahlo, Melanie; Chelly, Jamel; Dobyns, William B; Freytag, Saskia; Guerrini, Renzo; Leventer, Richard J; Poduri, Annapurna; Robertson, Stephen P; Walsh, Christopher A; Zhang, Mengqi

2018-05-08

Periventricular nodular heterotopia (PVNH) is a malformation of cortical development commonly associated with epilepsy. We exome sequenced 202 individuals with sporadic PVNH to identify novel genetic risk loci. We first performed a trio-based analysis and identified 219 de novo variants. Although no novel genes were implicated in this initial analysis, PVNH cases were found overall to have a significant excess of nonsynonymous de novo variants in intolerant genes (p = 3.27x10-7), suggesting a role for rare new alleles in genes yet to be associated with the condition. Using a gene-level collapsing analysis comparing cases and controls, we identified a genome-wide significant signal driven by four ultra-rare loss-of-function heterozygous variants in MAP1B, including one de novo variant. In at least one instance, the MAP1B variant was inherited from a parent with previously undiagnosed PVNH. The PVNH was frontally predominant and associated with perisylvian polymicrogyria. These results implicate MAP1B in PVNH. More broadly, our findings suggest that detrimental mutations likely arising in immediately preceding generations with incomplete penetrance may also be responsible for some apparently sporadic diseases.
De novo nonsense mutations in ASXL1 cause Bohring-Opitz syndrome

DEFF Research Database (Denmark)

Hoischen, Alexander; van Bon, Bregje W M; Rodríguez-Santiago, Benjamín

2011-01-01

Bohring-Opitz syndrome is characterized by severe intellectual disability, distinctive facial features and multiple congenital malformations. We sequenced the exomes of three individuals with Bohring-Opitz syndrome and in each identified heterozygous de novo nonsense mutations in ASXL1, which...... is required for maintenance of both activation and silencing of Hox genes. In total, 7 out of 13 subjects with a Bohring-Opitz phenotype had de novo ASXL1 mutations, suggesting that the syndrome is genetically heterogeneous....
Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models.

Science.gov (United States)

Mahony, Shaun; McInerney, James O; Smith, Terry J; Golden, Aaron

2004-03-05

Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation. This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential. While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to gene-prediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods.
Characterization of the beta amyloid precursor protein-like gene in the central nervous system of the crab Chasmagnathus. Expression during memory consolidation

Directory of Open Access Journals (Sweden)

Fustiñana Maria

2010-09-01

Full Text Available Abstract Background Human β-amyloid, the main component in the neuritic plaques found in patients with Alzheimer's disease, is generated by cleavage of the β-amyloid precursor protein. Beyond the role in pathology, members of this protein family are synaptic proteins and have been associated with synaptogenesis, neuronal plasticity and memory, both in vertebrates and in invertebrates. Consolidation is necessary to convert a short-term labile memory to a long-term and stable form. During consolidation, gene expression and de novo protein synthesis are regulated in order to produce key proteins for the maintenance of plastic changes produced during the acquisition of new information. Results Here we partially cloned and sequenced the beta-amyloid precursor protein like gene homologue in the crab Chasmagnathus (cappl, showing a 37% of identity with the fruit fly Drosophila melanogaster homologue and 23% with Homo sapiens but with much higher degree of sequence similarity in certain regions. We observed a wide distribution of cappl mRNA in the nervous system as well as in muscle and gills. The protein localized in all tissues analyzed with the exception of muscle. Immunofluorescence revealed localization of cAPPL in associative and sensory brain areas. We studied gene and protein expression during long-term memory consolidation using a well characterized memory model: the context-signal associative memory in this crab species. mRNA levels varied at different time points during long-term memory consolidation and correlated with cAPPL protein levels Conclusions cAPPL mRNA and protein is widely distributed in the central nervous system of the crab and the time course of expression suggests a role of cAPPL during long-term memory formation.
De novo mutations in the genome organizer CTCF cause intellectual disability

DEFF Research Database (Denmark)

Gregor, Anne; Oti, Martin; Kouwenhoven, Evelyn N

2013-01-01

An increasing number of genes involved in chromatin structure and epigenetic regulation has been implicated in a variety of developmental disorders, often including intellectual disability. By trio exome sequencing and subsequent mutational screening we now identified two de novo frameshift...... mutations and one de novo missense mutation in CTCF in individuals with intellectual disability, microcephaly, and growth retardation. Furthermore, an individual with a larger deletion including CTCF was identified. CTCF (CCCTC-binding factor) is one of the most important chromatin organizers in vertebrates...... and is involved in various chromatin regulation processes such as higher order of chromatin organization, enhancer function, and maintenance of three-dimensional chromatin structure. Transcriptome analyses in all three individuals with point mutations revealed deregulation of genes involved in signal transduction...
Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing

Directory of Open Access Journals (Sweden)

Cannon Charles H

2011-07-01

Full Text Available Abstract Background Acacia auriculiformis × Acacia mangium hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in Acacia hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants. Results We sequenced transcriptomes of A. auriculiformis and A. mangium from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. De novo assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for A. auriculiformis and A. mangium respectively. The assemblies of A. auriculiformis and A. mangium had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168 and one legume-specific family (miR2086. Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs in the transcriptomes of A. auriculiformis and A. mangium
Regulation of protein homeostasis in neurodegenerative diseases : the role of coding and non-coding genes

NARCIS (Netherlands)

Alvarenga Fernandes Sin, Olga; Nollen, Ellen A. A.

Protein homeostasis is fundamental for cell function and survival, because proteins are involved in all aspects of cellular function, ranging from cell metabolism and cell division to the cell's response to environmental challenges. Protein homeostasis is tightly regulated by the synthesis, folding,
Gene Duplication and Gene Expression Changes Play a Role in the Evolution of Candidate Pollen Feeding Genes in Heliconius Butterflies.

Science.gov (United States)

Smith, Gilbert; Macias-Muñoz, Aide; Briscoe, Adriana D

2016-09-02

Heliconius possess a unique ability among butterflies to feed on pollen. Pollen feeding significantly extends their lifespan, and is thought to have been important to the diversification of the genus. We used RNA sequencing to examine feeding-related gene expression in the mouthparts of four species of Heliconius and one nonpollen feeding species, Eueides isabella We hypothesized that genes involved in morphology and protein metabolism might be upregulated in Heliconius because they have longer proboscides than Eueides, and because pollen contains more protein than nectar. Using de novo transcriptome assemblies, we tested these hypotheses by comparing gene expression in mouthparts against antennae and legs. We first looked for genes upregulated in mouthparts across all five species and discovered several hundred genes, many of which had functional annotations involving metabolism of proteins (cocoonase), lipids, and carbohydrates. We then looked specifically within Heliconius where we found eleven common upregulated genes with roles in morphology (CPR cuticle proteins), behavior (takeout-like), and metabolism (luciferase-like). Closer examination of these candidates revealed that cocoonase underwent several duplications along the lineage leading to heliconiine butterflies, including two Heliconius-specific duplications. Luciferase-like genes also underwent duplication within lepidopterans, and upregulation in Heliconius mouthparts. Reverse-transcription PCR confirmed that three cocoonases, a peptidase, and one luciferase-like gene are expressed in the proboscis with little to no expression in labial palps and salivary glands. Our results suggest pollen feeding, like other dietary specializations, was likely facilitated by adaptive expansions of preexisting genes-and that the butterfly proboscis is involved in digestive enzyme production. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
De Novo Transcriptome Sequencing of Olea europaea L. to Identify Genes Involved in the Development of the Pollen Tube.

Science.gov (United States)

Iaria, Domenico; Chiappetta, Adriana; Muzzalupo, Innocenzo

2016-01-01

In olive (Olea europaea L.), the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca(2+) binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive.
The putative protein methyltransferase LAE1 controls cellulase gene expression in Trichoderma reesei

Science.gov (United States)

Seiboth, Bernhard; Karimi, Razieh Aghcheh; Phatale, Pallavi A; Linke, Rita; Hartl, Lukas; Sauer, Dominik G; Smith, Kristina M; Baker, Scott E; Freitag, Michael; Kubicek, Christian P

2012-01-01

Summary Trichoderma reesei is an industrial producer of enzymes that degrade lignocellulosic polysaccharides to soluble monomers, which can be fermented to biofuels. Here we show that the expression of genes for lignocellulose degradation are controlled by the orthologous T. reesei protein methyltransferase LAE1. In a lae1 deletion mutant we observed a complete loss of expression of all seven cellulases, auxiliary factors for cellulose degradation, β-glucosidases and xylanases were no longer expressed. Conversely, enhanced expression of lae1 resulted in significantly increased cellulase gene transcription. Lae1-modulated cellulase gene expression was dependent on the function of the general cellulase regulator XYR1, but also xyr1 expression was LAE1-dependent. LAE1 was also essential for conidiation of T. reesei. Chromatin immunoprecipitation followed by high-throughput sequencing (‘ChIP-seq’) showed that lae1 expression was not obviously correlated with H3K4 di- or trimethylation (indicative of active transcription) or H3K9 trimethylation (typical for heterochromatin regions) in CAZyme coding regions, suggesting that LAE1 does not affect CAZyme gene expression by directly modulating H3K4 or H3K9 methylation. Our data demonstrate that the putative protein methyltransferase LAE1 is essential for cellulase gene expression in T. reesei through mechanisms that remain to be identified. PMID:22554051
Neurodevelopmental disease-associated de novo mutations and rare sequence variants affect TRIO GDP/GTP exchange factor activity.

Science.gov (United States)

Katrancha, Sara M; Wu, Yi; Zhu, Minsheng; Eipper, Betty A; Koleske, Anthony J; Mains, Richard E

2017-12-01

Bipolar disorder, schizophrenia, autism and intellectual disability are complex neurodevelopmental disorders, debilitating millions of people. Therapeutic progress is limited by poor understanding of underlying molecular pathways. Using a targeted search, we identified an enrichment of de novo mutations in the gene encoding the 330-kDa triple functional domain (TRIO) protein associated with neurodevelopmental disorders. By generating multiple TRIO antibodies, we show that the smaller TRIO9 isoform is the major brain protein product, and its levels decrease after birth. TRIO9 contains two guanine nucleotide exchange factor (GEF) domains with distinct specificities: GEF1 activates both Rac1 and RhoG; GEF2 activates RhoA. To understand the impact of disease-associated de novo mutations and other rare sequence variants on TRIO function, we utilized two FRET-based biosensors: a Rac1 biosensor to study mutations in TRIO (T)GEF1, and a RhoA biosensor to study mutations in TGEF2. We discovered that one autism-associated de novo mutation in TGEF1 (K1431M), at the TGEF1/Rac1 interface, markedly decreased its overall activity toward Rac1. A schizophrenia-associated rare sequence variant in TGEF1 (F1538Intron) was substantially less active, normalized to protein level and expressed poorly. Overall, mutations in TGEF1 decreased GEF1 activity toward Rac1. One bipolar disorder-associated rare variant (M2145T) in TGEF2 impaired inhibition by the TGEF2 pleckstrin-homology domain, resulting in dramatically increased TGEF2 activity. Overall, genetic damage to both TGEF domains altered TRIO catalytic activity, decreasing TGEF1 activity and increasing TGEF2 activity. Importantly, both GEF changes are expected to decrease neurite outgrowth, perhaps consistent with their association with neurodevelopmental disorders. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes

Directory of Open Access Journals (Sweden)

Maggi Giorgio P

2008-06-01

Full Text Available Abstract Background The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent on the availability of annotated proteins. Results In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci. Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes. Conclusion Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.
De novo peptide design and experimental validation of histone methyltransferase inhibitors.

Directory of Open Access Journals (Sweden)

James Smadbeck

Full Text Available Histones are small proteins critical to the efficient packaging of DNA in the nucleus. DNA–protein complexes, known as nucleosomes, are formed when the DNA winds itself around the surface of the histones. The methylation of histone residues by enhancer of zeste homolog 2 (EZH2 maintains gene repression over successive cell generations. Overexpression of EZH2 can silence important tumor suppressor genes leading to increased invasiveness of many types of cancers. This makes the inhibition of EZH2 an important target in the development of cancer therapeutics. We employed a three-stage computational de novo peptide design method to design inhibitory peptides of EZH2. The method consists of a sequence selection stage and two validation stages for fold specificity and approximate binding affinity. The sequence selection stage consists of an integer linear optimization model that was solved to produce a rank-ordered list of amino acid sequences with increased stability in the bound peptide-EZH2 structure. These sequences were validated through the calculation of the fold specificity and approximate binding affinity of the designed peptides. Here we report the discovery of novel EZH2 inhibitory peptides using the de novo peptide design method. The computationally discovered peptides were experimentally validated in vitro using dose titrations and mechanism of action enzymatic assays. The peptide with the highest in vitro response, SQ037, was validated in nucleo using quantitative mass spectrometry-based proteomics. This peptide had an IC50 of 13.5 mM, demonstrated greater potency as an inhibitor when compared to the native and K27A mutant control peptides, and demonstrated competitive inhibition versus the peptide substrate. Additionally, this peptide demonstrated high specificity to the EZH2 target in comparison to other histone methyltransferases. The validated peptides are the first computationally designed peptides that directly inhibit EZH2
De novo peptide design and experimental validation of histone methyltransferase inhibitors.

Directory of Open Access Journals (Sweden)

James Smadbeck

Full Text Available Histones are small proteins critical to the efficient packaging of DNA in the nucleus. DNA-protein complexes, known as nucleosomes, are formed when the DNA winds itself around the surface of the histones. The methylation of histone residues by enhancer of zeste homolog 2 (EZH2 maintains gene repression over successive cell generations. Overexpression of EZH2 can silence important tumor suppressor genes leading to increased invasiveness of many types of cancers. This makes the inhibition of EZH2 an important target in the development of cancer therapeutics. We employed a three-stage computational de novo peptide design method to design inhibitory peptides of EZH2. The method consists of a sequence selection stage and two validation stages for fold specificity and approximate binding affinity. The sequence selection stage consists of an integer linear optimization model that was solved to produce a rank-ordered list of amino acid sequences with increased stability in the bound peptide-EZH2 structure. These sequences were validated through the calculation of the fold specificity and approximate binding affinity of the designed peptides. Here we report the discovery of novel EZH2 inhibitory peptides using the de novo peptide design method. The computationally discovered peptides were experimentally validated in vitro using dose titrations and mechanism of action enzymatic assays. The peptide with the highest in vitro response, SQ037, was validated in nucleo using quantitative mass spectrometry-based proteomics. This peptide had an IC50 of 13.5 [Formula: see text]M, demonstrated greater potency as an inhibitor when compared to the native and K27A mutant control peptides, and demonstrated competitive inhibition versus the peptide substrate. Additionally, this peptide demonstrated high specificity to the EZH2 target in comparison to other histone methyltransferases. The validated peptides are the first computationally designed peptides that directly
What does a worm want with 20,000 genes?

OpenAIRE

Hodgkin, Jonathan

2001-01-01

The number of genes predicted for the Caenorhabditis elegans genome is remarkably high: approximately 20,000, if both protein-coding and RNA-coding genes are counted. This article discusses possible explanations for such a high value.
Integrative Analyses of De Novo Mutations Provide Deeper Biological Insights into Autism Spectrum Disorder

Directory of Open Access Journals (Sweden)

Atsushi Takata

2018-01-01

Full Text Available Recent studies have established important roles of de novo mutations (DNMs in autism spectrum disorders (ASDs. Here, we analyze DNMs in 262 ASD probands of Japanese origin and confirm the “de novo paradigm” of ASDs across ethnicities. Based on this consistency, we combine the lists of damaging DNMs in our and published ASD cohorts (total number of trios, 4,244 and perform integrative bioinformatics analyses. Besides replicating the findings of previous studies, our analyses highlight ATP-binding genes and fetal cerebellar/striatal circuits. Analysis of individual genes identified 61 genes enriched for damaging DNMs, including ten genes for which our dataset now contributes to statistical significance. Screening of compounds altering the expression of genes hit by damaging DNMs reveals a global downregulating effect of valproic acid, a known risk factor for ASDs, whereas cardiac glycosides upregulate these genes. Collectively, our integrative approach provides deeper biological and potential medical insights into ASDs.

Protein Annotation from Protein Interaction Networks and Gene Ontology

OpenAIRE

Nguyen, Cao D.; Gardiner, Katheleen J.; Cios, Krzysztof J.

2011-01-01

We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precis...
A maternal high-fat, high-sucrose diet alters insulin sensitivity and expression of insulin signalling and lipid metabolism genes and proteins in male rat offspring: effect of folic acid supplementation.

Science.gov (United States)

Cuthbert, Candace E; Foster, Jerome E; Ramdath, D Dan

2017-10-01

A maternal high-fat, high-sucrose (HFS) diet alters offspring glucose and lipid homoeostasis through unknown mechanisms and may be modulated by folic acid. We investigated the effect of a maternal HFS diet on glucose homoeostasis, expression of genes and proteins associated with insulin signalling and lipid metabolism and the effect of prenatal folic acid supplementation (HFS/F) in male rat offspring. Pregnant Sprague-Dawley rats were randomly fed control (CON), HFS or HFS/F diets. Offspring were weaned on CON; at postnatal day 70, fasting plasma insulin and glucose and liver and skeletal muscle gene and protein expression were measured. Treatment effects were assessed by one-way ANOVA. Maternal HFS diet induced higher fasting glucose in offspring v. HFS/F (P=0·027) and down-regulation (Pinsulin resistance v. CON (P=0·030) and HFS/F was associated with higher insulin (P=0·016) and lower glucose (P=0·025). Maternal HFS diet alters offspring insulin sensitivity and de novo hepatic lipogenesis via altered gene and protein expression, which appears to be potentiated by folate supplementation.
Critical importance of the de novo pyrimidine biosynthesis pathway for Trypanosoma cruzi growth in the mammalian host cell cytoplasm

International Nuclear Information System (INIS)

Hashimoto, Muneaki; Morales, Jorge; Fukai, Yoshihisa; Suzuki, Shigeo; Takamiya, Shinzaburo; Tsubouchi, Akiko; Inoue, Syou; Inoue, Masayuki; Kita, Kiyoshi; Harada, Shigeharu; Tanaka, Akiko; Aoki, Takashi; Nara, Takeshi

2012-01-01

Highlights: ► We established Trypanosoma cruzi lacking the gene for carbamoyl phosphate synthetase II. ► Disruption of the cpsII gene significantly reduced the growth of epimastigotes. ► In particular, the CPSII-null mutant severely retarded intracellular growth. ► The de novo pyrimidine pathway is critical for the parasite growth in the host cell. -- Abstract: The intracellular parasitic protist Trypanosoma cruzi is the causative agent of Chagas disease in Latin America. In general, pyrimidine nucleotides are supplied by both de novo biosynthesis and salvage pathways. While epimastigotes—an insect form—possess both activities, amastigotes—an intracellular replicating form of T. cruzi—are unable to mediate the uptake of pyrimidine. However, the requirement of de novo pyrimidine biosynthesis for parasite growth and survival has not yet been elucidated. Carbamoyl-phosphate synthetase II (CPSII) is the first and rate-limiting enzyme of the de novo biosynthetic pathway, and increased CPSII activity is associated with the rapid proliferation of tumor cells. In the present study, we showed that disruption of the T. cruzicpsII gene significantly reduced parasite growth. In particular, the growth of amastigotes lacking the cpsII gene was severely suppressed. Thus, the de novo pyrimidine pathway is important for proliferation of T. cruzi in the host cell cytoplasm and represents a promising target for chemotherapy against Chagas disease.
Analysis of gene and protein name synonyms in Entrez Gene and UniProtKB resources

KAUST Repository

Arkasosy, Basil

2013-01-01

be ambiguous, referring in some cases to more than one gene or one protein, or in others, to both genes and proteins at the same time. Public biological databases give a very useful insight about genes and proteins information, including their names
Autism genes keep turning up chromatin.

Science.gov (United States)

Lasalle, Janine M

2013-06-19

Autism-spectrum disorders (ASD) are complex genetic disorders collectively characterized by impaired social interactions and language as well as repetitive and restrictive behaviors. Of the hundreds of genes implicated in ASD, those encoding proteins acting at neuronal synapses have been most characterized by candidate gene studies. However, recent unbiased genome-wide analyses have turned up a multitude of novel candidate genes encoding nuclear factors implicated in chromatin remodeling, histone demethylation, histone variants, and the recognition of DNA methylation. Furthermore, the chromatin landscape of the human genome has been shown to influence the location of de novo mutations observed in ASD as well as the landscape of DNA methylation underlying neurodevelopmental and synaptic processes. Understanding the interactions of nuclear chromatin proteins and DNA with signal transduction pathways and environmental influences in the developing brain will be critical to understanding the relevance of these ASD candidate genes and continued uncovering of the "roots" of autism etiology.
De novo assembly of the perennial ryegrass transcriptome using an RNA-Seq strategy.

Directory of Open Access Journals (Sweden)

Jacqueline D Farrell

Full Text Available Perennial ryegrass is a highly heterozygous outbreeding grass species used for turf and forage production. Heterozygosity can affect de-Bruijn graph assembly making de novo transcriptome assembly of species such as perennial ryegrass challenging. Creating a reference transcriptome from a homozygous perennial ryegrass genotype can circumvent the challenge of heterozygosity. The goals of this study were to perform RNA-sequencing on multiple tissues from a highly inbred genotype to develop a reference transcriptome. This was complemented with RNA-sequencing of a highly heterozygous genotype for SNP calling.De novo transcriptome assembly of the inbred genotype created 185,833 transcripts with an average length of 830 base pairs. Within the inbred reference transcriptome 78,560 predicted open reading frames were found of which 24,434 were predicted as complete. Functional annotation found 50,890 transcripts with a BLASTp hit from the Swiss-Prot non-redundant database, 58,941 transcripts with a Pfam protein domain and 1,151 transcripts encoding putative secreted peptides. To evaluate the reference transcriptome we targeted the high-affinity K+ transporter gene family and found multiple orthologs. Using the longest unique open reading frames as the reference sequence, 64,242 single nucleotide polymorphisms were found. One thousand sixty one open reading frames from the inbred genotype contained heterozygous sites, confirming the high degree of homozygosity.Our study has developed an annotated, comprehensive transcriptome reference for perennial ryegrass that can aid in determining genetic variation, expression analysis, genome annotation, and gene mapping.
Sequence of the intron/exon junctions of the coding region of the human androgen receptor gene and identification of a point mutation in a family with complete androgen insensitivity

International Nuclear Information System (INIS)

Lubahn, D.B.; Simental, J.A.; Higgs, H.N.; Wilson, E.M.; French, F.S.; Brown, T.R.; Migeon, C.J.

1989-01-01

Androgens act through a receptor protein (AR) to mediate sex differentiation and development of the male phenotype. The authors have isolated the eight exons in the amino acid coding region of the AR gene from a human X chromosome library. Nucleotide sequences of the AR gene intron/exon boundaries were determined for use in designing synthetic oligonucleotide primers to bracket coding exons for amplification by the polymerase chain reaction. Genomic DNA was amplified from 46, XY phenotypic female siblings with complete androgen insensitivity syndrome. AR binding affinity for dihydrotestosterone in the affected siblings was lower than in normal males, but the binding capacity was normal. Sequence analysis of amplified exons demonstrated within the AR steroid-binding domain (exon G) a single guanine to adenine mutation, resulting in replacement of valine with methionine at amino acid residue 866. As expected, the carrier mother had both normal and mutant AR genes. Thus, a single point mutation in the steroid-binding domain of the AR gene correlated with the expression of an AR protein ineffective in stimulating male sexual development
Fragile X mental retardation protein participates in non-coding RNA pathways.

Science.gov (United States)

Li, En-Hui; Zhao, Xin; Zhang, Ce; Liu, Wei

2018-02-20

Fragile X syndrome is one of the most common forms of inherited intellectual disability. It is caused by mutations of the Fragile X mental retardation 1(FMR1) gene, resulting in either the loss or abnormal expression of the Fragile X mental retardation protein (FMRP). Recent research showed that FMRP participates in non-coding RNA pathways and plays various important roles in physiology, thereby extending our knowledge of the pathogenesis of the Fragile X syndrome. Initial studies showed that the Drosophila FMRP participates in siRNA and miRNA pathways by interacting with Dicer, Ago1 and Ago2, involved in neural activity and the fate determination of the germline stem cells. Subsequent studies showed that the Drosophila FMRP participates in piRNA pathway by interacting with Aub, Ago1 and Piwi in the maintenance of normal chromatin structures and genomic stability. More recent studies showed that FMRP is associated with lncRNA pathway, suggesting a potential role for the involvement in the clinical manifestations. In this review, we summarize the novel findings and explore the relationship between FMRP and non-coding RNA pathways, particularly the piRNA pathway, thereby providing critical insights on the molecular pathogenesis of Fragile X syndrome, and potential translational applications in clinical management of the disease.
Structure of the human gene encoding the associated microfibrillar protein (MFAP1) and localization to chromosome 15q15-q21

Energy Technology Data Exchange (ETDEWEB)

Yeh, H.; Chow, M.; Abrams, W.R. [Univ. of Pennsylvania, Philadelphia, PA (United States)] [and others

1994-09-15

Microfibrils with a diameter of 10-12 nm, found either in assocation with elastin or independently, are an important component of the extracellular matrix of many tissues. To extend understanding of the proteins composing these microfibrils, the cDNA and gene encoding the human associated microfibril protein (MRAP1) have been cloned and characterized. The coding portion is contained in 9 exons, and the sequence is very homologous to the previously described chick cDNA, but does not appear to share homology or domain motifs with any other known protein. Interestingly, the gene has been localized to chromosome 15q15-q21 by somatic hybrid cell and chromosome in situ analyses. This is the same chromosomal region to which the fibrillin gene, FBN1, known to be defective in the Marfan syndrome, has been mapped. MFAP1 is a candidate gene for heritable diseases affecting microfibrils. 38 refs., 6 figs.
De novo transcriptome assembly of Setatria italica variety Taejin

Directory of Open Access Journals (Sweden)

Yeonhwa Jo

2016-06-01

Full Text Available Foxtail millet (Setaria italica belonging to the family Poaceae is an important millet that is widely cultivated in East Asia. Of the cultivated millets, the foxtail millet has the longest history and is one of the main food crops in South India and China. Moreover, foxtail millet is a model plant system for biofuel generation utilizing the C4 photosynthetic pathway. In this study, we carried out de novo transcriptome assembly for the foxtail millet variety Taejin collected from Korea using next-generation sequencing. We obtained a total of 8.676 GB raw data by paired-end sequencing. The raw data in this study can be available in NCBI SRA database with accession number of SRR3406552. The Trinity program was used to de novo assemble 145,332 transcripts. Using the TransDecoder program, we predicted 82,925 putative proteins. BLASTP was performed against the Swiss-Prot protein sequence database to annotate the functions of identified proteins, resulting in 20,555 potentially novel proteins. Taken together, this study provides transcriptome data for the foxtail millet variety Taejin by RNA-Seq.
[HMGA proteins and their genes as a potential neoplastic biomarkers].

Science.gov (United States)

Balcerczak, Ewa; Balcerczak, Mariusz; Mirowski, Marek

2005-01-01

HMGA proteins and their genes are described in this article. HMGA proteins reveal ability to bind DNA in AT-rich regions, which are characteristic for gene promoter sequences. This interaction lead to gene silencing or their overexpression. In normal tissue HMGA proteins level is low or even undetectable. During embriogenesis their level is increasing. High HMGA proteins level is characteristic for tumor phenotype of spontaneous and experimental malignant neoplasms. High HMGA proteins expression correlate with bad prognostic factors and with metastases formation. HMGA genes expression can be used as a marker of tumor progression. Present studies connected with tumor gene therapy based on HMGA proteins sythesis inhibition by the use of viral vectors containing gene encoding these proteins in antisence orientation, as well as a new potential anticancer drugs acting as crosslinkers between DNA and HMGA proteins suggest their usefulness as a targets in cancer therapy.
De novo insertions and deletions of predominantly paternal origin are associated with autism spectrum disorder

Science.gov (United States)

Dong, Shan; Walker, Michael F.; Carriero, Nicholas J.; DiCola, Michael; Willsey, A. Jeremy; Ye, Adam Y.; Waqar, Zainulabedin; Gonzalez, Luis E.; Overton, John D.; Frahm, Stephanie; Keaney, John F.; Teran, Nicole A.; Dea, Jeanselle; Mandell, Jeffrey D.; Bal, Vanessa Hus; Sullivan, Catherine A.; DiLullo, Nicholas M.; Khalil, Rehab O.; Gockley, Jake; Yuksel, Zafer; Sertel, Sinem M.; Ercan-Sencicek, A. Gulhan; Gupta, Abha R.; Mane, Shrikant M.; Sheldon, Michael; Brooks, Andrew I.; Roeder, Kathryn; Devlin, Bernie; State, Matthew W.; Wei, Liping; Sanders, Stephan J.

2014-01-01

SUMMARY Whole-exome sequencing (WES) studies have demonstrated the contribution of de novo loss-of-function single nucleotide variants to autism spectrum disorders (ASD). However, challenges in the reliable detection of de novo insertions and deletions (indels) have limited inclusion of these variants in prior analyses. Through the application of a robust indel detection method to WES data from 787 ASD families (2,963 individuals), we demonstrate that de novo frameshift indels contribute to ASD risk (OR=1.6; 95%CI=1.0-2.7; p=0.03), are more common in female probands (p=0.02), are enriched among genes encoding FMRP targets (p=6×10−9), and arise predominantly on the paternal chromosome (p<0.001). Based on mutation rates in probands versus unaffected siblings, de novo frameshift indels contribute to risk in approximately 3.0% of individuals with ASD. Finally, through observing clustering of mutations in unrelated probands, we report two novel ASD-associated genes: KMT2E (MLL5), a chromatin regulator, and RIMS1, a regulator of synaptic vesicle release. PMID:25284784
Identification of de novo mutations of Duchénnè/Becker muscular dystrophies in southern Spain.

Science.gov (United States)

Garcia, Susana; de Haro, Tomás; Zafra-Ceres, Mercedes; Poyatos, Antonio; Gomez-Capilla, Jose A; Gomez-Llorente, Carolina

2014-01-01

Duchénnè/Becker muscular dystrophies (DMD/BMD) are X-linked diseases, which are caused by a de novo gene mutation in one-third of affected males. The study objectives were to determine the incidence of DMD/BMD in Andalusia (Spain) and to establish the percentage of affected males in whom a de novo gene mutation was responsible. Multiplex ligation-dependent probe amplification (MLPA) technology was applied to determine the incidence of DMD/BMD in 84 males with suspicion of the disease and 106 female relatives. Dystrophin gene exon deletion (89.5%) or duplication (10.5%) was detected in 38 of the 84 males by MLPA technology; de novo mutations account for 4 (16.7%) of the 24 mother-son pairs studied. MLPA technology is adequate for the molecular diagnosis of DMD/BMD and establishes whether the mother carries the molecular alteration responsible for the disease, a highly relevant issue for genetic counseling.
Topological and organizational properties of the products of house-keeping and tissue-specific genes in protein-protein interaction networks.

Science.gov (United States)

Lin, Wen-Hsien; Liu, Wei-Chung; Hwang, Ming-Jing

2009-03-11

Human cells of various tissue types differ greatly in morphology despite having the same set of genetic information. Some genes are expressed in all cell types to perform house-keeping functions, while some are selectively expressed to perform tissue-specific functions. In this study, we wished to elucidate how proteins encoded by human house-keeping genes and tissue-specific genes are organized in human protein-protein interaction networks. We constructed protein-protein interaction networks for different tissue types using two gene expression datasets and one protein-protein interaction database. We then calculated three network indices of topological importance, the degree, closeness, and betweenness centralities, to measure the network position of proteins encoded by house-keeping and tissue-specific genes, and quantified their local connectivity structure. Compared to a random selection of proteins, house-keeping gene-encoded proteins tended to have a greater number of directly interacting neighbors and occupy network positions in several shortest paths of interaction between protein pairs, whereas tissue-specific gene-encoded proteins did not. In addition, house-keeping gene-encoded proteins tended to connect with other house-keeping gene-encoded proteins in all tissue types, whereas tissue-specific gene-encoded proteins also tended to connect with other tissue-specific gene-encoded proteins, but only in approximately half of the tissue types examined. Our analysis showed that house-keeping gene-encoded proteins tend to occupy important network positions, while those encoded by tissue-specific genes do not. The biological implications of our findings were discussed and we proposed a hypothesis regarding how cells organize their protein tools in protein-protein interaction networks. Our results led us to speculate that house-keeping gene-encoded proteins might form a core in human protein-protein interaction networks, while clusters of tissue-specific gene
De novo-based transcriptome profiling of male-sterile and fertile watermelon lines.

Science.gov (United States)

Rhee, Sun-Ju; Kwon, Taehyung; Seo, Minseok; Jang, Yoon Jeong; Sim, Tae Yong; Cho, Seoae; Han, Sang-Wook; Lee, Gung Pyo

2017-01-01

The whole-genome sequence of watermelon (Citrullus lanatus (Thunb.) Matsum. & Nakai), a valuable horticultural crop worldwide, was released in 2013. Here, we compared a de novo-based approach (DBA) to a reference-based approach (RBA) using RNA-seq data, to aid in efforts to improve the annotation of the watermelon reference genome and to obtain biological insight into male-sterility in watermelon. We applied these techniques to available data from two watermelon lines: the male-sterile line DAH3615-MS and the male-fertile line DAH3615. Using DBA, we newly annotated 855 watermelon transcripts, and found gene functional clusters predicted to be related to stimulus responses, nucleic acid binding, transmembrane transport, homeostasis, and Golgi/vesicles. Among the DBA-annotated transcripts, 138 de novo-exclusive differentially-expressed genes (DEDEGs) related to male sterility were detected. Out of 33 randomly selected newly annotated transcripts and DEDEGs, 32 were validated by RT-qPCR. This study demonstrates the usefulness and reliability of the de novo transcriptome assembly in watermelon, and provides new insights for researchers exploring transcriptional blueprints with regard to the male sterility.
Evolutionary modeling and prediction of non-coding RNAs in Drosophila.

Directory of Open Access Journals (Sweden)

Robert K Bradley

2009-08-01

Full Text Available We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3' end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA.
Fatty acid-binding protein genes of the ancient, air-breathing, ray-finned fish, spotted gar (Lepisosteus oculatus).

Science.gov (United States)

Venkatachalam, Ananda B; Fontenot, Quenton; Farrara, Allyse; Wright, Jonathan M

2018-03-01

With the advent of high-throughput DNA sequencing technology, the genomic sequence of many disparate species has led to the relatively new discipline of genomics, the study of genome structure, function and evolution. Much work has been focused on the role of whole genome duplications (WGD) in the architecture of extant vertebrate genomes, particularly those of teleost fishes which underwent a WGD early in the teleost radiation >230 million years ago (mya). Our past work has focused on the fate of duplicated copies of a multigene family coding for the intracellular lipid-binding protein (iLBP) genes in the teleost fishes. To define the evolutionary processes that determined the fate of duplicated genes and generated the structure of extant fish genomes, however, requires comparative genomic analysis with a fish lineage that diverged before the teleost WGD, such as the spotted gar (Lepisosteus oculatus), an ancient, air-breathing, ray-finned fish. Here, we describe the genomic organization, chromosomal location and tissue-specific expression of a subfamily of the iLBP genes that code for fatty acid-binding proteins (Fabps) in spotted gar. Based on this work, we have defined the minimum suite of fabp genes prior to their duplication in the teleost lineages ~230-400 mya. Spotted gar, therefore, serves as an appropriate outgroup, or ancestral/ancient fish, that did not undergo the teleost-specific WGD. As such, analyses of the spatio-temporal regulation of spotted gar genes provides a foundation to determine whether the duplicated fabp genes have been retained in teleost genomes owing to either sub- or neofunctionalization. Copyright © 2017 Elsevier Inc. All rights reserved.
RNAi mediates post-transcriptional repression of gene expression in fission yeast Schizosaccharomyces pombe

International Nuclear Information System (INIS)

Smialowska, Agata; Djupedal, Ingela; Wang, Jingwen; Kylsten, Per; Swoboda, Peter; Ekwall, Karl

2014-01-01

Highlights: • Protein coding genes accumulate anti-sense sRNAs in fission yeast S. pombe. • RNAi represses protein-coding genes in S. pombe. • RNAi-mediated gene repression is post-transcriptional. - Abstract: RNA interference (RNAi) is a gene silencing mechanism conserved from fungi to mammals. Small interfering RNAs are products and mediators of the RNAi pathway and act as specificity factors in recruiting effector complexes. The Schizosaccharomyces pombe genome encodes one of each of the core RNAi proteins, Dicer, Argonaute and RNA-dependent RNA polymerase (dcr1, ago1, rdp1). Even though the function of RNAi in heterochromatin assembly in S. pombe is established, its role in controlling gene expression is elusive. Here, we report the identification of small RNAs mapped anti-sense to protein coding genes in fission yeast. We demonstrate that these genes are up-regulated at the protein level in RNAi mutants, while their mRNA levels are not significantly changed. We show that the repression by RNAi is not a result of heterochromatin formation. Thus, we conclude that RNAi is involved in post-transcriptional gene silencing in S. pombe
Demonstration of de novo synthesis of enzymes by density labelling with stable isotopes

International Nuclear Information System (INIS)

Huebner, G.; Hirschberg, K.

1977-01-01

The technique of in vivo density labelling of proteins with H 2 18 O and 2 H 2 O has been used to investigate hormonal regulation and developmental expression of enzymes in plant cells. Buoyant density data obtained from isopycnic equilibrium centrifugation demonstrated that the cytokinine-induced nitrate reductase activity and the gibberellic acid-induced phosphatase activity in isolated embryos of Agrostemma githago are activities of enzymes synthesized de novo. The increase in alanine-specific aminopeptidase in germinating A. githago seeds is not due to de novo synthesis but to the release of preformed enzyme. On the basis of this result it is possible to apply the enzyme aminopeptidase as an internal density standard in equilibrium centrifugation. Density labelling experiments on proteins in pea cotyledons have been used to study the change in the activity of acid phosphatase, alanine-specific aminopeptidase, and peroxidase during germination. The activities of these enzymes increase in cotyledons of Pisum sativum. Density labelling by 18 O and 2 H demonstrates de novo synthesis of these three enzymes. The differential time course of enzyme induction shows the advantage of using H 2 18 O as labelling substance in cases when the enzyme was synthesized immediately at the beginning of germination. At this stage of development the amino-acid pool available for synthesis is formed principally by means of hydrolysis of storage proteins. The incorporation of 2 H into the new proteins takes place in a measurable amount at a stage of growth in which the amino acids are also synthesized de novo. The enzyme acid phosphatase of pea cotyledons was chosen to demonstrate the possibility of using the density labelling technique to detect protein turnover. (author)
Efficient assembly of de novo human artificial chromosomes from large genomic loci

Directory of Open Access Journals (Sweden)

Stromberg Gregory

2005-07-01

Full Text Available Abstract Background Human Artificial Chromosomes (HACs are potentially useful vectors for gene transfer studies and for functional annotation of the genome because of their suitability for cloning, manipulating and transferring large segments of the genome. However, development of HACs for the transfer of large genomic loci into mammalian cells has been limited by difficulties in manipulating high-molecular weight DNA, as well as by the low overall frequencies of de novo HAC formation. Indeed, to date, only a small number of large (>100 kb genomic loci have been reported to be successfully packaged into de novo HACs. Results We have developed novel methodologies to enable efficient assembly of HAC vectors containing any genomic locus of interest. We report here the creation of a novel, bimolecular system based on bacterial artificial chromosomes (BACs for the construction of HACs incorporating any defined genomic region. We have utilized this vector system to rapidly design, construct and validate multiple de novo HACs containing large (100–200 kb genomic loci including therapeutically significant genes for human growth hormone (HGH, polycystic kidney disease (PKD1 and ß-globin. We report significant differences in the ability of different genomic loci to support de novo HAC formation, suggesting possible effects of cis-acting genomic elements. Finally, as a proof of principle, we have observed sustained ß-globin gene expression from HACs incorporating the entire 200 kb ß-globin genomic locus for over 90 days in the absence of selection. Conclusion Taken together, these results are significant for the development of HAC vector technology, as they enable high-throughput assembly and functional validation of HACs containing any large genomic locus. We have evaluated the impact of different genomic loci on the frequency of HAC formation and identified segments of genomic DNA that appear to facilitate de novo HAC formation. These genomic loci

Functional characterisation of an Arabidopsis gene strongly induced by ionising radiation: the gene coding the poly(ADP-ribose)polymerase-1 (AthPARP-1)

International Nuclear Information System (INIS)

Doucet-Chabeaud, G.

2000-01-01

Arabidopsis thaliana, the model-system in plant genetics, has been used to study the responses to DNA damage, experimentally introduced by γ-irradiation. We have characterised a radiation-induced gene coding a 111 kDa protein, AthPARP-1, homologous to the human poly(ADP-ribose)polymerase-1 (hPARP-1). As hPARP-1 is composed by three functional domain with characteristic motifs, AthPARP-1 binds to DNA bearing single-strand breaks and shows DNA damage-dependent poly(ADP-ribosyl)ation. The preferential expression of AthPARP-1 in mitotically active tissues is in agreement with a potential role in the maintenance of genome integrity during DNA replication, as proposed for its human counterpart. Transcriptional gene activation by ionising radiation of AthPARP-1 and AthPARP-2 genes is to date plant specific activation. Our expression analyses after exposure to various stress indicate that 1) AthPARP-1 and AthPARP-2 play an important role in the response to DNA lesions, particularly they are activated by genotoxic agents implicating the BER DNA repair pathway 2) AthPARP-2 gene seems to play an additional role in the signal transduction induced by oxidative stress 3) the observed expression profile of AthPARP-1 is in favour of the regulation of AthPARP-1 gene expression at the level of transcription and translation. This mode of regulation of AthPARP-1 protein biosynthesis, clearly distinct from that observed in animals, needs the implication of a so far unidentified transcription factor that is activated by the presence of DNA lesions. The major outcome of this work resides in the isolation and characterisation of such new transcription factor, which will provide new insight on the regulation of plant gene expression by genotoxic stress. (author) [fr
DiffSLC: A graph centrality method to detect essential proteins of a protein-protein interaction network.

Science.gov (United States)

Mistry, Divya; Wise, Roger P; Dickerson, Julie A

2017-01-01

Identification of central genes and proteins in biomolecular networks provides credible candidates for pathway analysis, functional analysis, and essentiality prediction. The DiffSLC centrality measure predicts central and essential genes and proteins using a protein-protein interaction network. Network centrality measures prioritize nodes and edges based on their importance to the network topology. These measures helped identify critical genes and proteins in biomolecular networks. The proposed centrality measure, DiffSLC, combines the number of interactions of a protein and the gene coexpression values of genes from which those proteins were translated, as a weighting factor to bias the identification of essential proteins in a protein interaction network. Potentially essential proteins with low node degree are promoted through eigenvector centrality. Thus, the gene coexpression values are used in conjunction with the eigenvector of the network's adjacency matrix and edge clustering coefficient to improve essentiality prediction. The outcome of this prediction is shown using three variations: (1) inclusion or exclusion of gene co-expression data, (2) impact of different coexpression measures, and (3) impact of different gene expression data sets. For a total of seven networks, DiffSLC is compared to other centrality measures using Saccharomyces cerevisiae protein interaction networks and gene expression data. Comparisons are also performed for the top ranked proteins against the known essential genes from the Saccharomyces Gene Deletion Project, which show that DiffSLC detects more essential proteins and has a higher area under the ROC curve than other compared methods. This makes DiffSLC a stronger alternative to other centrality methods for detecting essential genes using a protein-protein interaction network that obeys centrality-lethality principle. DiffSLC is implemented using the igraph package in R, and networkx package in Python. The python package can be
Origins of gene, genetic code, protein and life: comprehensive view ...

Indian Academy of Sciences (India)

Unknown

production, suggesting that proteins were originally produced by random peptide formation of amino acids restricted in specific amino acid compositions .... using random numbers by a computer, to confirm whether main chains of ...... world on the origin of life by the pseudo-replication of. [GADV]-proteins in the absence of ...
De novo transcriptome sequencing and comparative analysis of differentially expressed genes in dryoperis fragrans under temperature stress

International Nuclear Information System (INIS)

Wang, W.Z.; Tong, W.S.; Gao, R.

2016-01-01

Dryopteris fragrans is a species of fern and contains flavonoids compounds with medicinal value. This study explain the temperature stress impact flavonoids synthesis in D. fragrans tissue culture seedlings under the low temperature at 4 degree C, high temperature at 35 degree C and moderate temperature at 25 degree C. By using Illumina HiSeq 2000 sequencing, 80.9 million raw sequence reads were de novo assembled into 66,716 non-redundant unigenes. 38,486 unigenes (57.7%) were annotated for their function. 13,973 unigenes and 29,598 unigenes were allocated to gene ontology (GO) and clusters of orthologous group (COG), respectively. 18,989 sequences mapped to 118 Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 204 genes were involved in flavonoid biosynthesis, regulation and transport. 25,292 and 16,817 unigenes exhibited marked differential expression in response to temperature shifts of 25 degree C to 4 degree C and 25 degree C to 35 degree C, respectively. 4CL and CHS genes involved in flavonoid biosynthesis were tested and suggested that they were responsible for biosynthesis of flavonoids. This study provides the first published data to describe the D. fragrans transcriptome and should accelerate understanding of flavonoids biosynthesis, regulation and transport mechanisms. Since most unigenes described here were successfully annotated, these results should facilitate future functional genomic understanding and research of D. fragrans. (author)
De-novo discovery of differentially abundant transcription factor binding sites including their positional preference.

Science.gov (United States)

Keilwagen, Jens; Grau, Jan; Paponov, Ivan A; Posch, Stefan; Strickert, Marc; Grosse, Ivo

2011-02-10

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open
Uncovering the functional constraints underlying the genomic organization of the odorant-binding protein genes.

Science.gov (United States)

Librado, Pablo; Rozas, Julio

2013-01-01

Animal olfactory systems have a critical role for the survival and reproduction of individuals. In insects, the odorant-binding proteins (OBPs) are encoded by a moderately sized gene family, and mediate the first steps of the olfactory processing. Most OBPs are organized in clusters of a few paralogs, which are conserved over time. Currently, the biological mechanism explaining the close physical proximity among OBPs is not yet established. Here, we conducted a comprehensive study aiming to gain insights into the mechanisms underlying the OBP genomic organization. We found that the OBP clusters are embedded within large conserved arrangements. These organizations also include other non-OBP genes, which often encode proteins integral to plasma membrane. Moreover, the conservation degree of such large clusters is related to the following: 1) the promoter architecture of the confined genes, 2) a characteristic transcriptional environment, and 3) the chromatin conformation of the chromosomal region. Our results suggest that chromatin domains may restrict the location of OBP genes to regions having the appropriate transcriptional environment, leading to the OBP cluster structure. However, the appropriate transcriptional environment for OBP and the other neighbor genes is not dominated by reduced levels of expression noise. Indeed, the stochastic fluctuations in the OBP transcript abundance may have a critical role in the combinatorial nature of the olfactory coding process.
RNA- and protein-mediated control of Listeria monocytogenes virulence gene expression

Science.gov (United States)

Lebreton, Alice; Cossart, Pascale

2017-01-01

ABSTRACT The model opportunistic pathogen Listeria monocytogenes has been the object of extensive research, aiming at understanding its ability to colonize diverse environmental niches and animal hosts. Bacterial transcriptomes in various conditions reflect this efficient adaptability. We review here our current knowledge of the mechanisms allowing L. monocytogenes to respond to environmental changes and trigger pathogenicity, with a special focus on RNA-mediated control of gene expression. We highlight how these studies have brought novel concepts in prokaryotic gene regulation, such as the ‘excludon’ where the 5′-UTR of a messenger also acts as an antisense regulator of an operon transcribed in opposite orientation, or the notion that riboswitches can regulate non-coding RNAs to integrate complex metabolic stimuli into regulatory networks. Overall, the Listeria model exemplifies that fine RNA tuners act together with master regulatory proteins to orchestrate appropriate transcriptional programmes. PMID:27217337
Female-biased expression of long non-coding RNAs in domains that escape X-inactivation in mouse

Directory of Open Access Journals (Sweden)

Lu Lu

2010-11-01

Full Text Available Abstract Background Sexual dimorphism in brain gene expression has been recognized in several animal species. However, the relevant regulatory mechanisms remain poorly understood. To investigate whether sex-biased gene expression in mammalian brain is globally regulated or locally regulated in diverse brain structures, and to study the genomic organisation of brain-expressed sex-biased genes, we performed a large scale gene expression analysis of distinct brain regions in adult male and female mice. Results This study revealed spatial specificity in sex-biased transcription in the mouse brain, and identified 173 sex-biased genes in the striatum; 19 in the neocortex; 12 in the hippocampus and 31 in the eye. Genes located on sex chromosomes were consistently over-represented in all brain regions. Analysis on a subset of genes with sex-bias in more than one tissue revealed Y-encoded male-biased transcripts and X-encoded female-biased transcripts known to escape X-inactivation. In addition, we identified novel coding and non-coding X-linked genes with female-biased expression in multiple tissues. Interestingly, the chromosomal positions of all of the female-biased non-coding genes are in close proximity to protein-coding genes that escape X-inactivation. This defines X-chromosome domains each of which contains a coding and a non-coding female-biased gene. Lack of repressive chromatin marks in non-coding transcribed loci supports the possibility that they escape X-inactivation. Moreover, RNA-DNA combined FISH experiments confirmed the biallelic expression of one such novel domain. Conclusion This study demonstrated that the amount of genes with sex-biased expression varies between individual brain regions in mouse. The sex-biased genes identified are localized on many chromosomes. At the same time, sexually dimorphic gene expression that is common to several parts of the brain is mostly restricted to the sex chromosomes. Moreover, the study uncovered
Characterization of foot-and-mouth disease virus gene products with antisera against bacterially synthesized fusion proteins

International Nuclear Information System (INIS)

Strebel, K.; Beck, E.; Strohmaier, K.; Schaller, H.

1986-01-01

Defined segments of the cloned foot-and-mouth disease virus genome corresponding to all parts of the coding region were expressed in Escherichia coli as fusions to the N-terminal part of the MS2-polymerase gene under the control of the inducible λPL promoter. All constructs yielded large amounts of proteins, which were purified and used to raise sequence-specific antisera in rabbits. These antisera were used to identify the corresponding viral gene products in 35 S-labeled extracts from foot-and-mouth disease virus-infected BHK cells. This allowed us to locate unequivocally all mature foot-and-mouth disease virus gene products in the nucleotide sequence, to identify precursor-product relationships, and to detect several foot-and mouth disease virus gene products not previously identified in vivo or in vitro
Inference of gene-phenotype associations via protein-protein interaction and orthology.

Directory of Open Access Journals (Sweden)

Panwen Wang

Full Text Available One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.
Using the 2A Protein Coexpression System: Multicistronic 2A Vectors Expressing Gene(s) of Interest and Reporter Proteins.

Science.gov (United States)

Luke, Garry A; Ryan, Martin D

2018-01-01

To date, a huge range of different proteins-many with cotranslational and posttranslational subcellular localization signals-have been coexpressed together with various reporter proteins in vitro and in vivo using 2A peptides. The pros and cons of 2A co-expression technology are considered below, followed by a simple example of a "how to" protocol to concatenate multiple genes of interest, together with a reporter gene, into a single gene linked via 2As for easy identification or selection of transduced cells.
Genomic assessment of the evolution of the prion protein gene family in vertebrates.

Science.gov (United States)

Harrison, Paul M; Khachane, Amit; Kumar, Manish

2010-05-01

Prion diseases are devastating neurological disorders caused by the propagation of particles containing an alternative beta-sheet-rich form of the prion protein (PrP). Genes paralogous to PrP, called Doppel and Shadoo, have been identified, that also have neuropathological relevance. To aid in the further functional characterization of PrP and its relatives, we annotated completely the PrP gene family (PrP-GF), in the genomes of 42 vertebrates, through combined strategic application of gene prediction programs and advanced remote homology detection techniques (such as HMMs, PSI-TBLASTN and pGenThreader). We have uncovered several previously undescribed paralogous genes and pseudogenes. We find that current high-quality genomic evidence indicates that the PrP relative Doppel, was likely present in the last common ancestor of present-day Tetrapoda, but was lost in the bird lineage, since its divergence from reptiles. Using the new gene annotations, we have defined the consensus of structural features that are characteristic of the PrP and Doppel structures, across diverse Tetrapoda clades. Furthermore, we describe in detail a transcribed pseudogene derived from Shadoo that is conserved across primates, and that overlaps the meiosis gene, SYCE1, thus possibly regulating its expression. In addition, we analysed the locus of PRNP/PRND for significant conservation across the genomic DNA of eleven mammals, and determined the phylogenetic penetration of non-coding exons. The genomic evidence indicates that the second PRNP non-coding exon found in even-toed ungulates and rodents, is conserved in all high-coverage genome assemblies of primates (human, chimp, orang utan and macaque), and is, at least, likely to have fallen out of use during primate speciation. Furthermore, we have demonstrated that the PRNT gene (at the PRNP human locus) is conserved across at least sixteen mammals, and evolves like a long non-coding RNA, fashioned from fragments of ancient, long
De Novo Insertions and Deletions of Predominantly Paternal Origin Are Associated with Autism Spectrum Disorder

Directory of Open Access Journals (Sweden)

Shan Dong

2014-10-01

Full Text Available Summary: Whole-exome sequencing (WES studies have demonstrated the contribution of de novo loss-of-function single-nucleotide variants (SNVs to autism spectrum disorder (ASD. However, challenges in the reliable detection of de novo insertions and deletions (indels have limited inclusion of these variants in prior analyses. By applying a robust indel detection method to WES data from 787 ASD families (2,963 individuals, we demonstrate that de novo frameshift indels contribute to ASD risk (OR = 1.6; 95% CI = 1.0–2.7; p = 0.03, are more common in female probands (p = 0.02, are enriched among genes encoding FMRP targets (p = 6 × 10−9, and arise predominantly on the paternal chromosome (p < 0.001. On the basis of mutation rates in probands versus unaffected siblings, we conclude that de novo frameshift indels contribute to risk in approximately 3% of individuals with ASD. Finally, by observing clustering of mutations in unrelated probands, we uncover two ASD-associated genes: KMT2E (MLL5, a chromatin regulator, and RIMS1, a regulator of synaptic vesicle release. : Insertions and deletions (indels have proven especially difficult to detect in exome sequencing data. Dong et al. now identify indels in exome data for 787 autism spectrum disorder (ASD families. They demonstrate association between de novo indels that alter the reading frame and ASD. Furthermore, by observing clustering of indels in unrelated probands, they uncover two additional ASD-associated genes: KMT2E (MLL5, a chromatin regulator, and RIMS1, a regulator of synaptic vesicle release.
Analysis of gene and protein name synonyms in Entrez Gene and UniProtKB resources

KAUST Repository

Arkasosy, Basil

2013-05-11

Ambiguity in texts is a well-known problem: words can carry several meanings, and hence, can be read and interpreted differently. This is also true in the biological literature; names of biological concepts, such as genes and proteins, might be ambiguous, referring in some cases to more than one gene or one protein, or in others, to both genes and proteins at the same time. Public biological databases give a very useful insight about genes and proteins information, including their names. In this study, we made a thorough analysis of the nomenclatures of genes and proteins in two data sources and for six different species. We developed an automated process that parses, extracts, processes and stores information available in two major biological databases: Entrez Gene and UniProtKB. We analysed gene and protein synonyms, their types, frequencies, and the ambiguities within a species, in between data sources and cross-species. We found that at least 40% of the cross-species ambiguities are caused by names that are already ambiguous within the species. Our study shows that from the six species we analysed (Homo Sapiens, Mus Musculus, Arabidopsis Thaliana, Oryza Sativa, Bacillus Subtilis and Pseudomonas Fluorescens), rice (Oriza Sativa) has the best naming model in Entrez Gene database, with low ambiguities between data sources and cross-species.
De novo transcriptome assembly for the tropical grass Panicum maximum Jacq.

Directory of Open Access Journals (Sweden)

Guilherme Toledo-Silva

Full Text Available Guinea grass (Panicum maximum Jacq. is a tropical African grass often used to feed beef cattle, which is an important economic activity in Brazil. Brazil is the leader in global meat exportation because of its exclusively pasture-raised bovine herds. Guinea grass also has potential uses in bioenergy production due to its elevated biomass generation through the C4 photosynthesis pathway. We generated approximately 13 Gb of data from Illumina sequencing of P. maximum leaves. Four different genotypes were sequenced, and the combined reads were assembled de novo into 38,192 unigenes and annotated; approximately 63% of the unigenes had homology to other proteins in the NCBI non-redundant protein database. Functional classification through COG (Clusters of Orthologous Groups, GO (Gene Ontology and KEGG (Kyoto Encyclopedia of Genes and Genomes analyses showed that the unigenes from Guinea grass leaves are involved in a wide range of biological processes and metabolic pathways, including C4 photosynthesis and lignocellulose generation, which are important for cattle grazing and bioenergy production. The most abundant transcripts were involved in carbon fixation, photosynthesis, RNA translation and heavy metal cellular homeostasis. Finally, we identified a number of potential molecular markers, including 5,035 microsatellites (SSRs and 346,456 single nucleotide polymorphisms (SNPs. To the best of our knowledge, this is the first study to characterize the complete leaf transcriptome of P. maximum using high-throughput sequencing. The biological information provided here will aid in gene expression studies and marker-assisted selection-based breeding research in tropical grasses.
Comparative De Novo Transcriptome Analysis of Fertilized Ovules in Xanthoceras sorbifolium Uncovered a Pool of Genes Expressed Specifically or Preferentially in the Selfed Ovule That Are Potentially Involved in Late-Acting Self-Incompatibility.

Directory of Open Access Journals (Sweden)

Qingyuan Zhou

Full Text Available Xanthoceras sorbifolium, a tree species endemic to northern China, has high oil content in its seeds and is recognized as an important biodiesel crop. The plant is characterized by late-acting self-incompatibility (LSI. LSI was found to occur in many angiosperm species and plays an important role in reducing inbreeding and its harmful effects, as do gametophytic self-incompatibility (GSI and sporophytic self-incompatibility (SSI. Molecular mechanisms of conventional GSI and SSI have been well characterized in several families, but no effort has been made to identify the genes involved in the LSI process. The present studies indicated that there were no significant differences in structural and histological features between the self- and cross-pollinated ovules during the early stages of ovule development until 5 days after pollination (DAP. This suggests that 5 DAP is likely to be a turning point for the development of the selfed ovules. Comparative de novo transcriptome analysis of the selfed and crossed ovules at 5 DAP identified 274 genes expressed specifically or preferentially in the selfed ovules. These genes contained a significant proportion of genes predicted to function in the biosynthesis of secondary metabolites, consistent with our histological observations in the fertilized ovules. The genes encoding signal transduction-related components, such as protein kinases and protein phosphatases, are overrepresented in the selfed ovules. X. sorbifolium selfed ovules also specifically or preferentially express many unique transcription factor (TF genes that could potentially be involved in the novel mechanisms of LSI. We also identified 42 genes significantly up-regulated in the crossed ovules compared to the selfed ovules. The expression of all 16 genes selected from the RNA-seq data was validated using PCR in the selfed and crossed ovules. This study represents the first genome-wide identification of genes expressed in the fertilized
Ubiquitin--conserved protein or selfish gene?

Science.gov (United States)

Catic, André; Ploegh, Hidde L

2005-11-01

The posttranslational modifier ubiquitin is encoded by a multigene family containing three primary members, which yield the precursor protein polyubiquitin and two ubiquitin moieties, Ub(L40) and Ub(S27), that are fused to the ribosomal proteins L40 and S27, respectively. The gene encoding polyubiquitin is highly conserved and, until now, those encoding Ub(L40) and Ub(S27) have been generally considered to be equally invariant. The evolution of the ribosomal ubiquitin moieties is, however, proving to be more dynamic. It seems that the genes encoding Ub(L40) and Ub(S27) are actively maintained by homologous recombination with the invariant polyubiquitin locus. Failure to recombine leads to deterioration of the sequence of the ribosomal ubiquitin moieties in several phyla, although this deterioration is evidently constrained by the structural requirements of the ubiquitin fold. Only a few amino acids in ubiquitin are vital for its function, and we propose that conservation of all three ubiquitin genes is driven not only by functional properties of the ubiquitin protein, but also by the propensity of the polyubiquitin locus to act as a 'selfish gene'.
Genome-wide identification of long non-coding RNA genes and their association with insecticide resistance and metamorphosis in diamondback moth, Plutella xylostella.

Science.gov (United States)

Liu, Feiling; Guo, Dianhao; Yuan, Zhuting; Chen, Chen; Xiao, Huamei

2017-11-20

Long non-coding RNA (lncRNA) is a class of noncoding RNA >200 bp in length that has essential roles in regulating a variety of biological processes. Here, we constructed a computational pipeline to identify lncRNA genes in the diamondback moth (Plutella xylostella), a major insect pest of cruciferous vegetables. In total, 3,324 lncRNAs corresponding to 2,475 loci were identified from 13 RNA-Seq datasets, including samples from parasitized, insecticide-resistant strains and different developmental stages. The identified P. xylostella lncRNAs had shorter transcripts and fewer exons than protein-coding genes. Seven out of nine randomly selected lncRNAs were validated by strand-specific RT-PCR. In total, 54-172 lncRNAs were specifically expressed in the insecticide resistant strains, among which one lncRNA was located adjacent to the sodium channel gene. In addition, 63-135 lncRNAs were specifically expressed in different developmental stages, among which three lncRNAs overlapped or were located adjacent to the metamorphosis-associated genes. These lncRNAs were either strongly or weakly co-expressed with their overlapping or neighboring mRNA genes. In summary, we identified thousands of lncRNAs and presented evidence that lncRNAs might have key roles in conferring insecticide resistance and regulating the metamorphosis development in P. xylostella.
Protein annotation from protein interaction networks and Gene Ontology.

Science.gov (United States)

Nguyen, Cao D; Gardiner, Katheleen J; Cios, Krzysztof J

2011-10-01

We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively. Copyright © 2011 Elsevier Inc. All rights reserved.
Engineering and introduction of de novo disulphide bridges in ...

Indian Academy of Sciences (India)

The engineeringof de novo disulphide bridges has been explored as a means to increase the thermal stability of enzymes in the rationalmethod of protein engineering. In this study, Disulphide by Design software, homology modelling and moleculardynamics simulations were used to select appropriate amino acid pairs for ...

Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

Science.gov (United States)

Fauteux, François; Strömvik, Martina V

2009-01-01

Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs
Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

Directory of Open Access Journals (Sweden)

Fauteux François

2009-10-01

Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination
Bcmimp1, a Botrytis cinerea gene transiently expressed in planta, encodes a mitochondrial protein

Directory of Open Access Journals (Sweden)

David eBenito-Pescador

2016-02-01

Full Text Available Botrytis cinerea is a widespread necrotrophic fungus which infects more than 200 plant species. In an attempt to characterize the physiological status of the fungus in planta and to identify genetic factors contributing to its ability to infect the host cells, a differential gene expression analysis during the interaction B. cinerea-tomato was carried out. Gene Bcmimp1 codes for a mRNA detected by differential display in the course of this analysis. During the interaction with the host, it shows a transient expression pattern with maximal expression levels during the colonization and maceration of the infected tissues. Bioinformatic analysis suggested that BCMIMP1 is an integral membrane protein located in the mitochondrial inner membrane. Co-localization experiments with a BCMIMP1-GFP fusion protein confirmed that the protein is targeted to the mitochondria. ΔBcmimp1 mutants do not show obvious phenotypic differences during saprophytic growth and their infection ability was unaltered as compared to the wild-type. Interestingly, the mutants produced increased levels of ROS, likely as a consequence of disturbed mitochondrial function. Although Bcmimp1 expression is enhanced in planta it cannot be considered a pathogenicity factor.
A systematic genome-wide analysis of zebrafish protein-coding gene function

NARCIS (Netherlands)

Kettleborough, R.N.; Busch-Nentwich, E.M.; Harvey, S.A.; Dooley, C.M.; de Bruijn, E.; van Eeden, F.; Sealy, I.; White, R.J.; Herd, C.; Nijman, I.J.; Fenyes, F.; Mehroke, S.; Scahill, C.; Gibbons, R.; Wali, N.; Carruthers, S.; Hall, A.; Yen, J.; Cuppen, E.; Stemple, D.L.

2013-01-01

Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms,
Complementation of Saccharomyces cerevisiae mutations in genes involved in translation and protein folding (EFB1 and SSB1) with Candida albicans cloned genes.

Science.gov (United States)

Maneu, V; Roig, P; Gozalbo, D

2000-11-01

We have demonstrated that the expression of Candida albicans genes involved in translation and protein folding (EFB1 and SSB1) complements the phenotype of Saccharomyces cerevisiae mutants. The elongation factor 1beta (EF-1beta) is essential for growth and efb1 S. cerevisiae null mutant cells are not viable; however, viable haploid cells, carrying the disrupted chromosomal allele of the S. cerevisiae EFB1 gene and pEFB1, were isolated upon sporulation of a diploid strain which was heterozygous at the EFB1 locus and transformed with pEFB1 (a pEMBLYe23 derivative plasmid containing an 8-kb DNA fragment from the C. albicans genome which contains the EFB1 gene). This indicates that the C. albicans EFB1 gene encodes a functional EF-1beta. Expression of the SSB1 gene from C. albicans, which codes for a member of the 70-kDa heat shock protein family, in S. cerevisiae ssb1 ssb2 double mutant complements the mutant phenotype (poor growth particularly at low temperature, and sensitivity to certain protein synthesis inhibitors, such as paromomycin). This complementation indicates that C. albicans Ssbl may function as a molecular chaperone on the translating ribosomes, as described in S. cerevisiae. Northern blot analysis showed that SSB mRNA levels increased after mild cold shift (28 degrees C to 23 degrees C) and rapidly decreased after mild heat shift (from 28 degrees C to 37 degrees C, and particularly to 42 degrees C), indicating that SSB1 expression is regulated by temperature. Therefore, Ssb1 may be considered as a molecular chaperone whose pattern of expression is similar to that found in ribosomal proteins, according to its common role in translation.
The Asian Rice Gall Midge (Orseolia oryzae Mitogenome Has Evolved Novel Gene Boundaries and Tandem Repeats That Distinguish Its Biotypes.

Directory of Open Access Journals (Sweden)

Isha Atray

Full Text Available The complete mitochondrial genome of the Asian rice gall midge, Orseolia oryzae (Diptera; Cecidomyiidae was sequenced, annotated and analysed in the present study. The circular genome is 15,286 bp with 13 protein-coding genes, 22 tRNAs and 2 ribosomal RNA genes, and a 578 bp non-coding control region. All protein coding genes used conventional start codons and terminated with a complete stop codon. The genome presented many unusual features: (1 rearrangement in the order of tRNAs as well as protein coding genes; (2 truncation and unusual secondary structures of tRNAs; (3 presence of two different repeat elements in separate non-coding regions; (4 presence of one pseudo-tRNA gene; (5 inversion of the rRNA genes; (6 higher percentage of non-coding regions when compared with other insect mitogenomes. Rearrangements of the tRNAs and protein coding genes are explained on the basis of tandem duplication and random loss model and why intramitochondrial recombination is a better model for explaining rearrangements in the O. oryzae mitochondrial genome is discussed. Furthermore, we evaluated the number of iterations of the tandem repeat elements found in the mitogenome. This led to the identification of genetic markers capable of differentiating rice gall midge biotypes and the two Orseolia species investigated.
Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.

Science.gov (United States)

Wang, Ying; Ding, Jun; Daniell, Henry; Hu, Haiyan; Li, Xiaoman

2012-09-01

Chloroplasts play critical roles in land plant cells. Despite their importance and the availability of at least 200 sequenced chloroplast genomes, the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper, we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes, indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses, protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs. Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein, which sheds light on the understanding of the transcriptional regulation of chloroplast genes.
Biased exonization of transposed elements in duplicated genes: A lesson from the TIF-IA gene

Directory of Open Access Journals (Sweden)

Shomron Noam

2007-11-01

Full Text Available Abstract Background Gene duplication and exonization of intronic transposed elements are two mechanisms that enhance genomic diversity. We examined whether there is less selection against exonization of transposed elements in duplicated genes than in single-copy genes. Results Genome-wide analysis of exonization of transposed elements revealed a higher rate of exonization within duplicated genes relative to single-copy genes. The gene for TIF-IA, an RNA polymerase I transcription initiation factor, underwent a humanoid-specific triplication, all three copies of the gene are active transcriptionally, although only one copy retains the ability to generate the TIF-IA protein. Prior to TIF-IA triplication, an Alu element was inserted into the first intron. In one of the non-protein coding copies, this Alu is exonized. We identified a single point mutation leading to exonization in one of the gene duplicates. When this mutation was introduced into the TIF-IA coding copy, exonization was activated and the level of the protein-coding mRNA was reduced substantially. A very low level of exonization was detected in normal human cells. However, this exonization was abundant in most leukemia cell lines evaluated, although the genomic sequence is unchanged in these cancerous cells compared to normal cells. Conclusion The definition of the Alu element within the TIF-IA gene as an exon is restricted to certain types of cancers; the element is not exonized in normal human cells. These results further our understanding of the delicate interplay between gene duplication and alternative splicing and of the molecular evolutionary mechanisms leading to genetic innovations. This implies the existence of purifying selection against exonization in single copy genes, with duplicate genes free from such constrains.
Associations between Familial Rates of Psychiatric Disorders and De Novo Genetic Mutations in Autism

Directory of Open Access Journals (Sweden)

Kyleen Luhrs

2017-01-01

Full Text Available The purpose of this study was to examine the confluence of genetic and familial risk factors in children with Autism Spectrum Disorder (ASD with distinct de novo genetic events. We hypothesized that gene-disrupting mutations would be associated with reduced rates of familial psychiatric disorders relative to structural mutations. Participants included families of children with ASD in four groups: de novo duplication copy number variations (DUP, n=62, de novo deletion copy number variations (DEL, n=74, de novo likely gene-disrupting mutations (LGDM, n=267, and children without a known genetic etiology (NON, n=2111. Familial rates of psychiatric disorders were calculated from semistructured interviews. Results indicated overall increased rates of psychiatric disorders in DUP families compared to DEL and LGDM families, specific to paternal psychiatric histories, and particularly evident for depressive disorders. Higher rates of depressive disorders in maternal psychiatric histories were observed overall compared to paternal histories and higher rates of anxiety disorders were observed in paternal histories for LGDM families compared to DUP families. These findings support the notion of an additive contribution of genetic etiology and familial factors are associated with ASD risk and highlight critical need for continued work targeting these relationships.
Whole Exome Sequencing for a Patient with Rubinstein-Taybi Syndrome Reveals de Novo Variants besides an Overt CREBBP Mutation

Directory of Open Access Journals (Sweden)

Hee Jeong Yoo

2015-03-01

Full Text Available Rubinstein-Taybi syndrome (RSTS is a rare condition with a prevalence of 1 in 125,000–720,000 births and characterized by clinical features that include facial, dental, and limb dysmorphology and growth retardation. Most cases of RSTS occur sporadically and are caused by de novo mutations. Cytogenetic or molecular abnormalities are detected in only 55% of RSTS cases. Previous genetic studies have yielded inconsistent results due to the variety of methods used for genetic analysis. The purpose of this study was to use whole exome sequencing (WES to evaluate the genetic causes of RSTS in a young girl presenting with an Autism phenotype. We used the Autism diagnostic observation schedule (ADOS and Autism diagnostic interview revised (ADI-R to confirm her diagnosis of Autism. In addition, various questionnaires were used to evaluate other psychiatric features. We used WES to analyze the DNA sequences of the patient and her parents and to search for de novo variants. The patient showed all the typical features of Autism, WES revealed a de novo frameshift mutation in CREBBP and de novo sequence variants in TNC and IGFALS genes. Mutations in the CREBBP gene have been extensively reported in RSTS patients, while potential missense mutations in TNC and IGFALS genes have not previously been associated with RSTS. The TNC and IGFALS genes are involved in central nervous system development and growth. It is possible for patients with RSTS to have additional de novo variants that could account for previously unexplained phenotypes.
On the total number of genes and their length distribution in complete microbial genomes

DEFF Research Database (Denmark)

Skovgaard, Marie; Jensen, L.J.; Brunak, Søren

2001-01-01

In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only similar to 3800 genes, and that a similar discrepancy exists for almost all published genomes....
Long non-coding RNA expression profiling of mouse testis during postnatal development.

Directory of Open Access Journals (Sweden)

Jin Sun

Full Text Available Mammalian testis development and spermatogenesis play critical roles in male fertility and continuation of a species. Previous research into the molecular mechanisms of testis development and spermatogenesis has largely focused on the role of protein-coding genes and small non-coding RNAs, such as microRNAs and piRNAs. Recently, it has become apparent that large numbers of long (>200 nt non-coding RNAs (lncRNAs are transcribed from mammalian genomes and that lncRNAs perform important regulatory functions in various developmental processes. However, the expression of lncRNAs and their biological functions in post-natal testis development remain unknown. In this study, we employed microarray technology to examine lncRNA expression profiles of neonatal (6-day-old and adult (8-week-old mouse testes. We found that 8,265 lncRNAs were expressed above background levels during post-natal testis development, of which 3,025 were differentially expressed. Candidate lncRNAs were identified for further characterization by an integrated examination of genomic context, gene ontology (GO enrichment of their associated protein-coding genes, promoter analysis for epigenetic modification, and evolutionary conservation of elements. Many lncRNAs overlapped or were adjacent to key transcription factors and other genes involved in spermatogenesis, such as Ovol1, Ovol2, Lhx1, Sox3, Sox9, Plzf, c-Kit, Wt1, Sycp2, Prm1 and Prm2. Most differentially expressed lncRNAs exhibited epigenetic modification marks similar to protein-coding genes and tend to be expressed in a tissue-specific manner. In addition, the majority of differentially expressed lncRNAs harbored evolutionary conserved elements. Taken together, our findings represent the first systematic investigation of lncRNA expression in the mammalian testis and provide a solid foundation for further research into the molecular mechanisms of lncRNAs function in mammalian testis development and spermatogenesis.
Tetrahymena thermophila acidic ribosomal protein L37 contains an archaebacterial type of C-terminus

DEFF Research Database (Denmark)

Hansen, T S; Andreasen, P H; Dreisig, H

1991-01-01

We have cloned and characterized a Tetrahymena thermophila macronuclear gene (L37) encoding the acidic ribosomal protein (A-protein) L37. The gene contains a single intron located in the 3'-part of the coding region. Two major and three minor transcription start points (tsp) were mapped 39 to 63 ...... by protein sequencing. The T. thermophila L37 clearly belongs to the P1-type family of eukaryotic A-proteins, but the C-terminal region has the hallmarks of archaebacterial A-proteins.......We have cloned and characterized a Tetrahymena thermophila macronuclear gene (L37) encoding the acidic ribosomal protein (A-protein) L37. The gene contains a single intron located in the 3'-part of the coding region. Two major and three minor transcription start points (tsp) were mapped 39 to 63...... nucleotides upstream from the translational start codon. The uppermost tsp mapped to the first T in a putative T. thermophila RNA polymerase II initiator element, TATAA. The coding region of L37 predicts a protein of 109 amino acid (aa) residues. A substantial part of the deduced aa sequence was verified...
RNA-Seq analysis during the life cycle of Cryptosporidium parvum reveals significant differential gene expression between proliferating stages in the intestine and infectious sporozoites.

Science.gov (United States)

Lippuner, Christoph; Ramakrishnan, Chandra; Basso, Walter U; Schmid, Marc W; Okoniewski, Michal; Smith, Nicholas C; Hässig, Michael; Deplazes, Peter; Hehl, Adrian B

2018-05-01

Cryptosporidium parvum is a major cause of diarrhoea in humans and animals. There are no vaccines and few drugs available to control C. parvum. In this study, we used RNA-Seq to compare gene expression in sporozoites and intracellular stages of C. parvum to identify genes likely to be important for successful completion of the parasite's life cycle and, thereby, possible targets for drugs or vaccines. We identified 3774 protein-encoding transcripts in C. parvum. Applying a stringent cut-off of eight fold for determination of differential expression, we identified 173 genes (26 coding for predicted secreted proteins) upregulated in sporozoites. On the other hand, expression of 1259 genes was upregulated in intestinal stages (merozoites/gamonts) with a gene ontology enrichment for 63 biological processes and upregulation of 117 genes in 23 metabolic pathways. There was no clear stage specificity of expression of AP2-domain containing transcription factors, although sporozoites had a relatively small repertoire of these important regulators. Our RNA-Seq analysis revealed a new calcium-dependent protein kinase, bringing the total number of known calcium-dependent protein kinases (CDPKs) in C. parvum to 11. One of these, CDPK1, was expressed in all stages, strengthening the notion that it is a valid drug target. By comparing parasites grown in vivo (which produce bona fide thick-walled oocysts) and in vitro (which are arrested in sexual development prior to oocyst generation) we were able to confirm that genes encoding oocyst wall proteins are expressed in gametocytes and that the proteins are stockpiled rather than generated de novo in zygotes. RNA-Seq analysis of C. parvum revealed genes expressed in a stage-specific manner and others whose expression is required at all stages of development. The functional significance of these can now be addressed through recent advances in transgenics for C. parvum, and may lead to the identification of viable drug and vaccine
A library of MiMICs allows tagging of genes and reversible, spatial and temporal knockdown of proteins in Drosophila

Science.gov (United States)

Nagarkar-Jaiswal, Sonal; Lee, Pei-Tseng; Campbell, Megan E; Chen, Kuchuan; Anguiano-Zarate, Stephanie; Cantu Gutierrez, Manuel; Busby, Theodore; Lin, Wen-Wen; He, Yuchun; Schulze, Karen L; Booth, Benjamin W; Evans-Holm, Martha; Venken, Koen JT; Levis, Robert W; Spradling, Allan C; Hoskins, Roger A; Bellen, Hugo J

2015-01-01

Here, we document a collection of ∼7434 MiMIC (Minos Mediated Integration Cassette) insertions of which 2854 are inserted in coding introns. They allowed us to create a library of 400 GFP-tagged genes. We show that 72% of internally tagged proteins are functional, and that more than 90% can be imaged in unfixed tissues. Moreover, the tagged mRNAs can be knocked down by RNAi against GFP (iGFPi), and the tagged proteins can be efficiently knocked down by deGradFP technology. The phenotypes associated with RNA and protein knockdown typically correspond to severe loss of function or null mutant phenotypes. Finally, we demonstrate reversible, spatial, and temporal knockdown of tagged proteins in larvae and adult flies. This new strategy and collection of strains allows unprecedented in vivo manipulations in flies for many genes. These strategies will likely extend to vertebrates. DOI: http://dx.doi.org/10.7554/eLife.05338.001 PMID:25824290
De novo transcriptome and small RNA analysis of two Chinese willow cultivars reveals stress response genes in Salix matsudana.

Directory of Open Access Journals (Sweden)

Guodong Rao

Full Text Available Salix matsudana Koidz. is a deciduous, rapidly growing, and drought resistant tree and is one of the most widely distributed and commonly cultivated willow species in China. Currently little transcriptomic and small RNAomic data are available to reveal the genes involve in the stress resistant in S. matsudana. Here, we report the RNA-seq analysis results of both transcriptome and small RNAome data using Illumina deep sequencing of shoot tips from two willow variants(Salix. matsudana and Salix matsudana Koidz. cultivar 'Tortuosa'. De novo gene assembly was used to generate the consensus transcriptome and small RNAome, which contained 106,403 unique transcripts with an average length of 944 bp and a total length of 100.45 MB, and 166 known miRNAs representing 35 miRNA families. Comparison of transcriptomes and small RNAomes combined with quantitative real-time PCR from the two Salix libraries revealed a total of 292 different expressed genes(DEGs and 36 different expressed miRNAs (DEMs. Among the DEGs and DEMs, 196 genes and 24 miRNAs were up regulated, 96 genes and 12 miRNA were down regulated in S. matsudana. Functional analysis of DEGs and miRNA targets showed that many genes were involved in stress resistance in S. matsudana. Our global gene expression profiling presents a comprehensive view of the transcriptome and small RNAome which provide valuable information and sequence resources for uncovering the stress response genes in S. matsudana. Moreover the transcriptome and small RNAome data provide a basis for future study of genetic resistance in Salix.
Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

Science.gov (United States)

Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

2015-11-18

RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as
De-repressing LncRNA-Targeted Genes to Upregulate Gene Expression: Focus on Small Molecule Therapeutics

Directory of Open Access Journals (Sweden)

Roya Pedram Fatemi

2014-01-01

Full Text Available Non-protein coding RNAs (ncRNAs make up the overwhelming majority of transcripts in the genome and have recently gained attention for their complex regulatory role in cells, including the regulation of protein-coding genes. Furthermore, ncRNAs play an important role in normal development and their expression levels are dysregulated in several diseases. Recently, several long noncoding RNAs (lncRNAs have been shown to alter the epigenetic status of genomic loci and suppress the expression of target genes. This review will present examples of such a mechanism and focus on the potential to target lncRNAs for achieving therapeutic gene upregulation by de-repressing genes that are epigenetically silenced in various diseases. Finally, the potential to target lncRNAs, through their interactions with epigenetic enzymes, using various tools, such as small molecules, viral vectors and antisense oligonucleotides, will be discussed. We suggest that small molecule modulators of a novel class of drug targets, lncRNA-protein interactions, have great potential to treat some cancers, cardiovascular disease, and neurological disorders.
Allele-Selective Transcriptome Recruitment to Polysomes Primed for Translation: Protein-Coding and Noncoding RNAs, and RNA Isoforms.

Directory of Open Access Journals (Sweden)

Roshan Mascarenhas

Full Text Available mRNA translation into proteins is highly regulated, but the role of mRNA isoforms, noncoding RNAs (ncRNAs, and genetic variants remains poorly understood. mRNA levels on polysomes have been shown to correlate well with expressed protein levels, pointing to polysomal loading as a critical factor. To study regulation and genetic factors of protein translation we measured levels and allelic ratios of mRNAs and ncRNAs (including microRNAs in lymphoblast cell lines (LCL and in polysomal fractions. We first used targeted assays to measure polysomal loading of mRNA alleles, confirming reported genetic effects on translation of OPRM1 and NAT1, and detecting no effect of rs1045642 (3435C>T in ABCB1 (MDR1 on polysomal loading while supporting previous results showing increased mRNA turnover of the 3435T allele. Use of high-throughput sequencing of complete transcript profiles (RNA-Seq in three LCLs revealed significant differences in polysomal loading of individual RNA classes and isoforms. Correlated polysomal distribution between protein-coding and non-coding RNAs suggests interactions between them. Allele-selective polysome recruitment revealed strong genetic influence for multiple RNAs, attributable either to differential expression of RNA isoforms or to differential loading onto polysomes, the latter defining a direct genetic effect on translation. Genes identified by different allelic RNA ratios between cytosol and polysomes were enriched with published expression quantitative trait loci (eQTLs affecting RNA functions, and associations with clinical phenotypes. Polysomal RNA-Seq combined with allelic ratio analysis provides a powerful approach to study polysomal RNA recruitment and regulatory variants affecting protein translation.
Analysis of SNP rs16754 of WT1 gene in a series of de novo acute myeloid leukemia patients.

Science.gov (United States)

Luna, Irene; Such, Esperanza; Cervera, Jose; Barragán, Eva; Jiménez-Velasco, Antonio; Dolz, Sandra; Ibáñez, Mariam; Gómez-Seguí, Inés; López-Pavía, María; Llop, Marta; Fuster, Óscar; Oltra, Silvestre; Moscardó, Federico; Martínez-Cuadrón, David; Senent, M Leonor; Gascón, Adriana; Montesinos, Pau; Martín, Guillermo; Bolufer, Pascual; Sanz, Miguel A

2012-12-01

The single nucleotide polymorphism (SNP) rs16754 of the WT1 gene has been previously described as a possible prognostic marker in normal karyotype acute myeloid leukemia (AML) patients. Nevertheless, the findings in this field are not always reproducible in different series. One hundred and seventy-five adult de novo AML patients were screened with two different methods for the detection of SNP rs16754: high-resolution melting (HRM) and FRET hybridization probes. Direct sequencing was used to validate both techniques. The SNP was detected in 52 out of 175 patients (30 %), both by HRM and hybridization probes. Direct sequencing confirmed that every positive sample in the screening methods had a variation in the DNA sequence. Patients with the wild-type genotype (WT1(AA)) for the SNP rs16754 were significantly younger than those with the heterozygous WT1(AG) genotype. No other difference was observed for baseline characteristic or outcome between patients with or without the SNP. Both techniques are equally reliable and reproducible as screening methods for the detection of the SNP rs16754, allowing for the selection of those samples that will need to be sequenced. We were unable to confirm the suggested favorable outcome of SNP rs16754 in de novo AML.

Operon Gene Order Is Optimized for Ordered Protein Complex Assembly

Science.gov (United States)

Wells, Jonathan N.; Bergendahl, L. Therese; Marsh, Joseph A.

2016-01-01

Summary The assembly of heteromeric protein complexes is an inherently stochastic process in which multiple genes are expressed separately into proteins, which must then somehow find each other within the cell. Here, we considered one of the ways by which prokaryotic organisms have attempted to maximize the efficiency of protein complex assembly: the organization of subunit-encoding genes into operons. Using structure-based assembly predictions, we show that operon gene order has been optimized to match the order in which protein subunits assemble. Exceptions to this are almost entirely highly expressed proteins for which assembly is less stochastic and for which precisely ordered translation offers less benefit. Overall, these results show that ordered protein complex assembly pathways are of significant biological importance and represent a major evolutionary constraint on operon gene organization. PMID:26804901
Metazoan Remaining Genes for Essential Amino Acid Biosynthesis: Sequence Conservation and Evolutionary Analyses

Directory of Open Access Journals (Sweden)

Igor R. Costa

2014-12-01

Full Text Available Essential amino acids (EAA consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS and betaine-homocysteine S-methyltransferase (BHMT diverged from the expected Tree of Life (ToL relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.
APPRIS 2017: principal isoforms for multiple gene sets

Science.gov (United States)

Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso

2018-01-01

Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Deep developmental transcriptome sequencing uncovers numerous new genes and enhances gene annotation in the sponge Amphimedon queenslandica.

Science.gov (United States)

Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M

2015-05-15

The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.
Isolation and characterisation of cDNA clones representing the genes encoding the major tuber storage protein (dioscorin) of yam (Dioscorea cayenensis Lam.).

Science.gov (United States)

Conlan, R S; Griffiths, L A; Napier, J A; Shewry, P R; Mantell, S; Ainsworth, C

1995-06-01

cDNA clones encoding dioscorins, the major tuber storage proteins (M(r) 32,000) of yam (Dioscorea cayenesis) have been isolated. Two classes of clone (A and B, based on hybrid release translation product sizes and nucleotide sequence differences) which are 84.1% similar in their protein coding regions, were identified. The protein encoded by the open reading frame of the class A cDNA insert is of M(r) 30,015. The difference in observed and calculated molecular mass might be attributed to glycosylation. Nucleotide sequencing and in vitro transcription/translation suggest that the class A dioscorin proteins are synthesised with signal peptides of 18 amino acid residues which are cleaved from the mature peptide. The class A and class B proteins are 69.6% similar with respect to each other, but show no sequence identity with other plant proteins or with the major tuber storage proteins of potato (patatin) or sweet potato (sporamin). Storage protein gene expression was restricted to developing tubers and was not induced by growth conditions known to induce expression of tuber storage protein genes in other plant species. The codon usage of the dioscorin genes suggests that the Dioscoreaceae are more closely related to dicotyledonous than to monocotyledonous plants.
A Drosophila gene encoding a protein resembling the human β-amyloid protein precursor

International Nuclear Information System (INIS)

Rosen, D.R.; Martin-Morris, L.; Luo, L.; White, K.

1989-01-01

The authors have isolated genomic and cDNA clones for a Drosophila gene resembling the human β-amyloid precursor protein (APP). This gene produces a nervous system-enriched 6.5-kilobase transcript. Sequencing of cDNAs derived from the 6.5-kilobase transcript predicts an 886-amino acid polypeptide. This polypeptide contains a putative transmembrane domain and exhibits strong sequence similarity to cytoplasmic and extracellular regions of the human β-amyloid precursor protein. There is a high probability that this Drosophila gene corresponds to the essential Drosophila locus vnd, a gene required for embryonic nervous system development
Computational Approaches Reveal New Insights into Regulation and Function of Non; coding RNAs and their Targets

KAUST Repository

Alam, Tanvir

2016-11-28

Regulation and function of protein-coding genes are increasingly well-understood, but no comparable evidence exists for non-coding RNA (ncRNA) genes, which appear to be more numerous than protein-coding genes. We developed a novel machine-learning model to distinguish promoters of long ncRNA (lncRNA) genes from those of protein-coding genes. This represents the first attempt to make this distinction based on properties of the associated gene promoters. From our analyses, several transcription factors (TFs), which are known to be regulated by lncRNAs, also emerged as potential global regulators of lncRNAs, suggesting that lncRNAs and TFs may participate in bidirectional feedback regulatory network. Our results also raise the possibility that, due to the historical dependence on protein-coding gene in defining the chromatin states of active promoters, an adjustment of these chromatin signature profiles to incorporate lncRNAs is warranted in the future. Secondly, we developed a novel method to infer functions for lncRNA and microRNA (miRNA) transcripts based on their transcriptional regulatory networks in 119 tissues and 177 primary cells of human. This method for the first time combines information of cell/tissueVspecific expression of a transcript and the TFs and transcription coVfactors (TcoFs) that control activation of that transcript. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues and associated knowledgebase (FARNA) is developed. FARNA, having the most comprehensive function annotation of considered ncRNAs across the widest spectrum of cells/tissues, has a potential to contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. Thirdly, we developed a novel machine-learning model to identify LD motif (a protein interaction motif) of paxillin, a ncRNA target that is involved in cell motility and cancer metastasis. Our recognition model identified new proteins not
Generation and characterization of P gene-deficient rabies virus

International Nuclear Information System (INIS)

Shoji, Youko; Inoue, Satoshi; Nakamichi, Kazuo; Kurane, Ichiro; Sakai, Takeo; Morimoto, Kinjiro

2004-01-01

Rabies virus (RV) deficient in the P gene was generated by reverse genetics from cDNA of HEP-Flury strain lacking the entire P gene. The defective virus was propagated and amplified by rescue of virus, using a cell line that complemented the functions of the deficient gene. The P gene-deficient (def-P) virus replicated its genome and produced progeny viruses in the cell lines that constitutively expressed the P protein, although it grew at a slightly retarded rate compared to the parental strain. In contrast, no progeny virus was produced in the infected host when the def-P virus-infected cells that did not express the P protein. However, we found that the def-P virus had the ability to perform primary transcription (by the virion-associated polymerase) in the infected host without de novo P protein synthesis. The def-P virus was apathogenic in adult and suckling mice, even when inoculated intracranially. Inoculation of def-P virus in mice induced high levels of virus-neutralizing antibody (VNA) and conferred protective immunity against a lethal rabies infection. These results demonstrate the potential utility of gene-deficient virus as a novel live attenuated rabies vaccine
De novo transcriptome assembly and comparative analysis of differentially expressed genes in Prunus dulcis Mill. in response to freezing stress.

Directory of Open Access Journals (Sweden)

Sadegh Mousavi

Full Text Available Almond (Prunus dulcis Mill., one of the most important nut crops, requires chilling during winter to develop fruiting buds. However, early spring chilling and late spring frost may damage the reproductive tissues leading to reduction in the rate of productivity. Despite the importance of transcriptional changes and regulation, little is known about the almond's transcriptome under the cold stress conditions. In the current research, we used RNA-seq technique to study the response of the reproductive tissues of almond (anther and ovary to frost stress. RNA sequencing resulted in more than 20 million reads from anther and ovary tissues of almond, individually. About 40,000 contigs were assembled and annotated de novo in each tissue. Profile of gene expression in ovary showed significant alterations in 5,112 genes, whereas in anther 6,926 genes were affected by freezing stress. Around two thousands of these genes were common altered genes in both ovary and anther libraries. Gene ontology indicated the involvement of differentially expressed (DE genes, responding to freezing stress, in metabolic and cellular processes. qRT-PCR analysis verified the expression pattern of eight genes randomly selected from the DE genes. In conclusion, the almond gene index assembled in this study and the reported DE genes can provide great insights on responses of almond and other Prunus species to abiotic stresses. The obtained results from current research would add to the limited available information on almond and Rosaceae. Besides, the findings would be very useful for comparative studies as the number of DE genes reported here is much higher than that of any previous reports in this plant.
De novo transcriptome assembly and comparative analysis of differentially expressed genes in Prunus dulcis Mill. in response to freezing stress.

Science.gov (United States)

Mousavi, Sadegh; Alisoltani, Arghavan; Shiran, Behrouz; Fallahi, Hossein; Ebrahimie, Esameil; Imani, Ali; Houshmand, Saadollah

2014-01-01

Almond (Prunus dulcis Mill.), one of the most important nut crops, requires chilling during winter to develop fruiting buds. However, early spring chilling and late spring frost may damage the reproductive tissues leading to reduction in the rate of productivity. Despite the importance of transcriptional changes and regulation, little is known about the almond's transcriptome under the cold stress conditions. In the current research, we used RNA-seq technique to study the response of the reproductive tissues of almond (anther and ovary) to frost stress. RNA sequencing resulted in more than 20 million reads from anther and ovary tissues of almond, individually. About 40,000 contigs were assembled and annotated de novo in each tissue. Profile of gene expression in ovary showed significant alterations in 5,112 genes, whereas in anther 6,926 genes were affected by freezing stress. Around two thousands of these genes were common altered genes in both ovary and anther libraries. Gene ontology indicated the involvement of differentially expressed (DE) genes, responding to freezing stress, in metabolic and cellular processes. qRT-PCR analysis verified the expression pattern of eight genes randomly selected from the DE genes. In conclusion, the almond gene index assembled in this study and the reported DE genes can provide great insights on responses of almond and other Prunus species to abiotic stresses. The obtained results from current research would add to the limited available information on almond and Rosaceae. Besides, the findings would be very useful for comparative studies as the number of DE genes reported here is much higher than that of any previous reports in this plant.
Molecular cloning of the gene for the human placental GTP-binding protein Gp (G25K): Identification of this GTP-binding protein as the human homolog of the yeast cell-division-cycle protein CDC42

International Nuclear Information System (INIS)

Shinjo, K.; Koland, J.G.; Hart, M.J.; Narasimhan, V.; Cerione, R.A.; Johnson, D.I.; Evans, T.

1990-01-01

The authors have isolated cDNA clones from a human placental library that code for a low molecular weight GTP-binding protein originally designated G p (also called G25K). This identification is based on comparisons with the available peptide sequences for the purified human G p protein and the use of two highly specific anti-peptide antibodies. The predicted amino acid sequence of the protein is very similar to those of various members of the ras superfamily of low molecular weight GTP-binding proteins, including the N-, Ki-, and Ha-ras proteins (30-35% identical), the rho proteins and the rac proteins. The highest degree of sequence identity (80%) is found with the Saccharomyces cerevisiae cell division-cycle protein CDC42. The human placental gene, which they designate CDC42Hs, complements the cdc42-1 mutation in S. cerevisiae, which suggests that this GTP-binding protein is the human homolog of the yeast protein
Bifurcations in the interplay of messenger RNA, protein and nonprotein coding RNA

International Nuclear Information System (INIS)

Zhdanov, Vladimir P

2008-01-01

The interplay of messenger RNA (mRNA), protein, produced via translation of this RNA, and nonprotein coding RNA (ncRNA) may include regulation of the ncRNA production by protein and (i) ncRNA-protein association resulting in suppression of the protein regulatory activity or (ii) ncRNA-mRNA association resulting in degradation of the miRNA-mRNA complex. The kinetic models describing these two scenarios are found to predict bistability provided that protein suppresses the ncRNA formation
Sponge non-metastatic Group I Nme gene/protein - structure and function is conserved from sponges to humans

Science.gov (United States)

2011-01-01

Background Nucleoside diphosphate kinases NDPK are evolutionarily conserved enzymes present in Bacteria, Archaea and Eukarya, with human Nme1 the most studied representative of the family and the first identified metastasis suppressor. Sponges (Porifera) are simple metazoans without tissues, closest to the common ancestor of all animals. They changed little during evolution and probably provide the best insight into the metazoan ancestor's genomic features. Recent studies show that sponges have a wide repertoire of genes many of which are involved in diseases in more complex metazoans. The original function of those genes and the way it has evolved in the animal lineage is largely unknown. Here we report new results on the metastasis suppressor gene/protein homolog from the marine sponge Suberites domuncula, NmeGp1Sd. The purpose of this study was to investigate the properties of the sponge Group I Nme gene and protein, and compare it to its human homolog in order to elucidate the evolution of the structure and function of Nme. Results We found that sponge genes coding for Group I Nme protein are intron-rich. Furthermore, we discovered that the sponge NmeGp1Sd protein has a similar level of kinase activity as its human homolog Nme1, does not cleave negatively supercoiled DNA and shows nonspecific DNA-binding activity. The sponge NmeGp1Sd forms a hexamer, like human Nme1, and all other eukaryotic Nme proteins. NmeGp1Sd interacts with human Nme1 in human cells and exhibits the same subcellular localization. Stable clones expressing sponge NmeGp1Sd inhibited the migratory potential of CAL 27 cells, as already reported for human Nme1, which suggests that Nme's function in migratory processes was engaged long before the composition of true tissues. Conclusions This study suggests that the ancestor of all animals possessed a NmeGp1 protein with properties and functions similar to evolutionarily recent versions of the protein, even before the appearance of true tissues
Arabidopsis mRNA polyadenylation machinery: comprehensive analysis of protein-protein interactions and gene expression profiling

Directory of Open Access Journals (Sweden)

Mo Min

2008-05-01

Full Text Available Abstract Background The polyadenylation of mRNA is one of the critical processing steps during expression of almost all eukaryotic genes. It is tightly integrated with transcription, particularly its termination, as well as other RNA processing events, i.e. capping and splicing. The poly(A tail protects the mRNA from unregulated degradation, and it is required for nuclear export and translation initiation. In recent years, it has been demonstrated that the polyadenylation process is also involved in the regulation of gene expression. The polyadenylation process requires two components, the cis-elements on the mRNA and a group of protein factors that recognize the cis-elements and produce the poly(A tail. Here we report a comprehensive pairwise protein-protein interaction mapping and gene expression profiling of the mRNA polyadenylation protein machinery in Arabidopsis. Results By protein sequence homology search using human and yeast polyadenylation factors, we identified 28 proteins that may be components of Arabidopsis polyadenylation machinery. To elucidate the protein network and their functions, we first tested their protein-protein interaction profiles. Out of 320 pair-wise protein-protein interaction assays done using the yeast two-hybrid system, 56 (~17% showed positive interactions. 15 of these interactions were further tested, and all were confirmed by co-immunoprecipitation and/or in vitro co-purification. These interactions organize into three distinct hubs involving the Arabidopsis polyadenylation factors. These hubs are centered around AtCPSF100, AtCLPS, and AtFIPS. The first two are similar to complexes seen in mammals, while the third one stands out as unique to plants. When comparing the gene expression profiles extracted from publicly available microarray datasets, some of the polyadenylation related genes showed tissue-specific expression, suggestive of potential different polyadenylation complex configurations. Conclusion An
From structure prediction to genomic screens for novel non-coding RNAs

DEFF Research Database (Denmark)

Gorodkin, Jan; Hofacker, Ivo L.

2011-01-01

Abstract: Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction....... This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early...... upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other....
A murC gene in Porphyromonas gingivalis 381.

Science.gov (United States)

Ansai, T; Yamashita, Y; Awano, S; Shibata, Y; Wachi, M; Nagai, K; Takehara, T

1995-09-01

The gene encoding a 51 kDa polypeptide of Porphyromonas gingivalis 381 was isolated by immunoblotting using an antiserum raised against P. gingivalis alkaline phosphatase. DNA sequence analysis of a 2.5 kb DNA fragment containing a gene encoding the 51 kDa protein revealed one complete and two incomplete ORFs. Database searches using the FASTA program revealed significant homology between the P. gingivalis 51 kDa protein and the MurC protein of Escherichia coli, which functions in peptidoglycan synthesis. The cloned 51 kDa protein encoded a functional product that complemented an E. coli murC mutant. Moreover, the ORF just upstream of murC coded for a protein that was 31% homologous with the E. coli MurG protein. The ORF just downstream of murC coded for a protein that was 17% homologous with the Streptococcus pneumoniae penicillin-binding protein 2B (PBP2B), which functions in peptidoglycan synthesis and is responsible for antibiotic resistance. These results suggest that P. gingivalis contains a homologue of the E. coli peptidoglycan synthesis gene murC and indicate the possibility of a cluster of genes responsible for cell division and cell growth, as in the E. coli mra region.
Uridine monophosphate synthetase enables eukaryotic de novo NAD+ biosynthesis from quinolinic acid.

Science.gov (United States)

McReynolds, Melanie R; Wang, Wenqing; Holleran, Lauren M; Hanna-Rose, Wendy

2017-07-07

NAD + biosynthesis is an attractive and promising therapeutic target for influencing health span and obesity-related phenotypes as well as tumor growth. Full and effective use of this target for therapeutic benefit requires a complete understanding of NAD + biosynthetic pathways. Here, we report a previously unrecognized role for a conserved phosphoribosyltransferase in NAD + biosynthesis. Because a required quinolinic acid phosphoribosyltransferase (QPRTase) is not encoded in its genome, Caenorhabditis elegans are reported to lack a de novo NAD + biosynthetic pathway. However, all the genes of the kynurenine pathway required for quinolinic acid (QA) production from tryptophan are present. Thus, we investigated the presence of de novo NAD + biosynthesis in this organism. By combining isotope-tracing and genetic experiments, we have demonstrated the presence of an intact de novo biosynthesis pathway for NAD + from tryptophan via QA, highlighting the functional conservation of this important biosynthetic activity. Supplementation with kynurenine pathway intermediates also boosted NAD + levels and partially reversed NAD + -dependent phenotypes caused by mutation of pnc-1 , which encodes a nicotinamidase required for NAD + salvage biosynthesis, demonstrating contribution of de novo synthesis to NAD + homeostasis. By investigating candidate phosphoribosyltransferase genes in the genome, we determined that the conserved uridine monophosphate phosphoribosyltransferase (UMPS), which acts in pyrimidine biosynthesis, is required for NAD + biosynthesis in place of the missing QPRTase. We suggest that similar underground metabolic activity of UMPS may function in other organisms. This mechanism for NAD + biosynthesis creates novel possibilities for manipulating NAD + biosynthetic pathways, which is key for the future of therapeutics. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Characterization of chicken riboflavin carrier protein gene structure ...

Indian Academy of Sciences (India)

The chicken riboflavin carrier protein (RCP) is an estrogen induced egg yolk and white protein. Eggs from hens which have a splice mutation in RCP gene fail to hatch, indicating an absolute requirement of RCP for the transport of riboflavin to the oocyte. In order to understand the mechanism of regulation of this gene by ...
Protein functional links in Trypanosoma brucei, identified by gene fusion analysis

Directory of Open Access Journals (Sweden)

Trimpalis Philip

2011-07-01

Full Text Available Abstract Background Domain or gene fusion analysis is a bioinformatics method for detecting gene fusions in one organism by comparing its genome to that of other organisms. The occurrence of gene fusions suggests that the two original genes that participated in the fusion are functionally linked, i.e. their gene products interact either as part of a multi-subunit protein complex, or in a metabolic pathway. Gene fusion analysis has been used to identify protein functional links in prokaryotes as well as in eukaryotic model organisms, such as yeast and Drosophila. Results In this study we have extended this approach to include a number of recently sequenced protists, four of which are pathogenic, to identify fusion linked proteins in Trypanosoma brucei, the causative agent of African sleeping sickness. We have also examined the evolution of the gene fusion events identified, to determine whether they can be attributed to fusion or fission, by looking at the conservation of the fused genes and of the individual component genes across the major eukaryotic and prokaryotic lineages. We find relatively limited occurrence of gene fusions/fissions within the protist lineages examined. Our results point to two trypanosome-specific gene fissions, which have recently been experimentally confirmed, one fusion involving proteins involved in the same metabolic pathway, as well as two novel putative functional links between fusion-linked protein pairs. Conclusions This is the first study of protein functional links in T. brucei identified by gene fusion analysis. We have used strict thresholds and only discuss results which are highly likely to be genuine and which either have already been or can be experimentally verified. We discuss the possible impact of the identification of these novel putative protein-protein interactions, to the development of new trypanosome therapeutic drugs.
Herpes simplex virus type 1 gene UL14: phenotype of a null mutant and identification of the encoded protein.

Science.gov (United States)

Cunningham, C; Davison, A J; MacLean, A R; Taus, N S; Baines, J D

2000-01-01

Herpes simplex virus type 1 (HSV-1) gene UL14 is located between divergently transcribed genes UL13 and UL15 and overlaps the promoters for both of these genes. UL14 also exhibits a substantial overlap of its coding region with that of UL13. It is one of the few HSV-1 genes for which a phenotype and protein product have not been described. Using mass spectrometric and immunological approaches, we demonstrated that the UL14 protein is a minor component of the virion tegument of 32 kDa which is expressed late in infection. In infected cells, the UL14 protein was detected in the nucleus at discrete sites within electron-dense nuclear bodies and in the cytoplasm initially in a diffuse distribution and then at discrete sites. Some of the UL14 protein was phosphorylated. A mutant with a 4-bp deletion in the central region of UL14 failed to produce the UL14 protein and generated small plaques. The mutant exhibited an extended growth cycle at low multiplicity of infection and appeared to be compromised in efficient transit of virus particles from the infected cell. In mice injected intracranially, the 50% lethal dose of the mutant was reduced more than 30,000-fold. Recovery of the mutant from the latently infected sacral ganglia of mice injected peripherally was significantly less than that of wild-type virus, suggesting a marked defect in the establishment of, or reactivation from, latent infection.

The Genomic Code: Genome Evolution and Potential Applications

KAUST Repository

Bernardi, Giorgio

2016-01-25

The genome of metazoans is organized according to a genomic code which comprises three laws: 1) Compositional correlations hold between contiguous coding and non-coding sequences, as well as among the three codon positions of protein-coding genes; these correlations are the consequence of the fact that the genomes under consideration consist of fairly homogeneous, long (≥200Kb) sequences, the isochores; 2) Although isochores are defined on the basis of purely compositional properties, GC levels of isochores are correlated with all tested structural and functional properties of the genome; 3) GC levels of isochores are correlated with chromosome architecture from interphase to metaphase; in the case of interphase the correlation concerns isochores and the three-dimensional “topological associated domains” (TADs); in the case of mitotic chromosomes, the correlation concerns isochores and chromosomal bands. Finally, the genomic code is the fourth and last pillar of molecular biology, the first three pillars being 1) the double helix structure of DNA; 2) the regulation of gene expression in prokaryotes; and 3) the genetic code.
Mechanisms of radiation-induced gene responses

International Nuclear Information System (INIS)

Woloschak, G.E.; Paunesku, T.

1996-01-01

In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5' region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3' region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts; however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process
Surfactant proteins gene variants in premature newborn infants with severe respiratory distress syndrome.

Science.gov (United States)

Somaschini, Marco; Presi, Silvia; Ferrari, Maurizio; Vergani, Barbara; Carrera, Paola

2017-12-19

Genetic surfactant dysfunction causes respiratory failure in term and near-term newborn infants, but little is known of such condition in prematures. We evaluated genetic surfactant dysfunction in premature newborn infants with severe RDS. A total of 68 preterm newborn infants with gestational age ≤32 weeks affected by unusually severe RDS were analysed for mutations in SFTPB, SFTPC and ABCA3. Therapies included oxygen supplementation, nasal CPAP, different modalities of ventilatory support, administration of exogenous surfactant, inhaled nitric oxide and steroids. Molecular analyses were performed on genomic DNA extracted from peripheral blood and Sanger sequencing of whole gene coding regions and intron junctions. In one case histology and electron microscopy on lung tissue was performed. Heterozygous previously described rare or novel variants in surfactant proteins genes ABCA3, SFTPB and SFTPC were identified in 24 newborn infants. In total, 11 infants died at age of 2 to 6 months. Ultrastructural analysis of lung tissue of one infant showed features suggesting ABCA3 dysfunction. Rare or novel genetic variants in genes encoding surfactant proteins were identified in a large proportion (35%) of premature newborn infants with particularly severe RDS. We speculate that interaction of developmental immaturity of surfactant production in association with abnormalities of surfactant metabolism of genetic origin may have a synergic worsening phenotypic effect.
De novo deletion of chromosome 11q12.3 in monozygotic twins affected by Poland Syndrome.

Science.gov (United States)

Vaccari, Carlotta Maria; Romanini, Maria Victoria; Musante, Ilaria; Tassano, Elisa; Gimelli, Stefania; Divizia, Maria Teresa; Torre, Michele; Morovic, Carmen Gloria; Lerone, Margherita; Ravazzolo, Roberto; Puliti, Aldamaria

2014-05-30

Poland Syndrome (PS) is a rare disorder characterized by hypoplasia/aplasia of the pectoralis major muscle, variably associated with thoracic and upper limb anomalies. Familial recurrence has been reported indicating that PS could have a genetic basis, though the genetic mechanisms underlying PS development are still unknown. Here we describe a couple of monozygotic (MZ) twin girls, both presenting with Poland Syndrome. They carry a de novo heterozygous 126 Kbp deletion at chromosome 11q12.3 involving 5 genes, four of which, namely HRASLS5, RARRES3, HRASLS2, and PLA2G16, encode proteins that regulate cellular growth, differentiation, and apoptosis, mainly through Ras-mediated signaling pathways. Phenotype concordance between the monozygotic twin probands provides evidence supporting the genetic control of PS. As genes controlling cell growth and differentiation may be related to morphological defects originating during development, we postulate that the observed chromosome deletion could be causative of the phenotype observed in the twin girls and the deleted genes could play a role in PS development.
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.

Science.gov (United States)

Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P

2015-04-23

With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
Molecular characterisation of the nucleocapsid protein gene, glycoprotein gene and gene junctions of rhabdovirus 903/87, a novel fish pathogenic rhabdovirus

DEFF Research Database (Denmark)

Johansson, Tove; Nylund, S.; Olesen, Niels Jørgen

2001-01-01

, M, G and L genes it was determined that transcription start and stop codons were conserved between virus 903/87 and the vesiculo viruses. Virus 903/87 has no open reading frame coding for a non-virion gene between the glycoprotein and the polymerase gene. Phylogenetic studies based on rhabdovirus...
Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

Science.gov (United States)

Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.
A novel human gene encoding a G-protein-coupled receptor (GPR15) is located on chromosome 3

Energy Technology Data Exchange (ETDEWEB)

Heiber, M.; Marchese, A.; O`Dowd, B.F. [Univ. of Toronto, Ontario (Canada)] [and others

1996-03-05

We used sequence similarities among G-protein-coupled receptor genes to discover a novel receptor gene. Using primers based on conserved regions of the opioid-related receptors, we isolated a PCR product that was used to locate the full-length coding region of a novel human receptor gene, which we have named GPR15. A comparison of the amino acid sequence of the receptor gene, which we have named GPR15. A comparison of the amino acid sequence of the receptor encoded by GPR15 with other receptors revealed that it shared sequence identity with the angiotensin II AT1 and AT2 receptors, the interleukin 8b receptor, and the orphan receptors GPR1 and AGTL1. GPR15 was mapped to human chromosome 3q11.2-q13.1. 12 refs., 2 figs.
D-bifunctional protein deficiency associated with drug resistant infantile spasms

NARCIS (Netherlands)

Buoni, Sabrina; Zannolli, Raffaella; Waterham, Hans; Wanders, Ronald; Fois, Alberto

2007-01-01

Peroxisomal disorders appear with a frequency of about 1:5000 in newborns. Peroxisomal D-bifunctional protein (D-BP), encoded by the HSD17B4 gene (gene ID: 3294; locus tag: HGNC:5213, chromosome 5q2; official symbol: HSD17B4; name: hydroxysteroid (17-beta) dehydrogenase; gene type: protein coding)
Downregulation of ATM Gene and Protein Expression in Canine Mammary Tumors.

Science.gov (United States)

Raposo-Ferreira, T M M; Bueno, R C; Terra, E M; Avante, M L; Tinucci-Costa, M; Carvalho, M; Cassali, G D; Linde, S D; Rogatto, S R; Laufer-Amorim, R

2016-11-01

The ataxia telangiectasia mutated (ATM) gene encodes a protein associated with DNA damage repair and maintenance of genomic integrity. In women, ATM transcript and protein downregulation have been reported in sporadic breast carcinomas, and the absence of ATM protein expression has been associated with poor prognosis. The aim of this study was to evaluate ATM gene and protein expression in canine mammary tumors and their association with clinical outcome. ATM gene and protein expression was evaluated by reverse transcription-quantitative polymerase chain reaction and immunohistochemistry, respectively, in normal mammary gland samples (n = 10), benign mammary tumors (n = 11), nonmetastatic mammary carcinomas (n = 19), and metastatic mammary carcinomas (n = 11). Lower ATM transcript levels were detected in benign mammary tumors and carcinomas compared with normal mammary glands (P = .011). Similarly, lower ATM protein expression was observed in benign tumors (P = .0003), nonmetastatic mammary carcinomas (P ATM gene or protein levels were detected among benign tumors and nonmetastatic and metastatic mammary carcinomas (P > .05). The levels of ATM gene or protein expression were not significantly associated with clinical and pathological features or with survival. Similar to human breast cancer, the data in this study suggest that ATM gene and protein downregulation is involved in canine mammary gland tumorigenesis. © The Author(s) 2016.
Quantifying the mechanisms of domain gain in animal proteins.

Science.gov (United States)

Buljan, Marija; Frankish, Adam; Bateman, Alex

2010-01-01

Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms. Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events. The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
Comparative studies of vertebrate endothelin-converting enzyme-like 1 genes and proteins

Directory of Open Access Journals (Sweden)

Holmes RS

2013-01-01

Full Text Available Roger S Holmes,1,2 Laura A Cox11Department of Genetics and Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA; 2Eskitis Institute for Cell and Molecular Therapies and School of Biomolecular and Physical Sciences, Griffith University, Nathan, Queensland, AustraliaAbstract: Endothelin-converting enzyme-like 1 (ECEL1 is a member of the M13 family of neutral endopeptidases which play an essential role in the neural regulation of vertebrate respiration. Genetic deficiency of this protein results in respiratory failure soon after birth. Comparative ECEL1 amino acid sequences and structures and ECEL1 gene locations were examined using data from several vertebrate genome projects. Vertebrate ECEL1 sequences shared 66%–99% identity as compared with 30%–63% sequence identities with other M13-like family members, ECE1, ECE2, and NEP (neprilysin or MME. Three N-glycosylation sites were conserved among most vertebrate ECEL1 proteins examined. Sequence alignments, conserved key amino acid residues, and predicted secondary and tertiary structures were also studied, including cytoplasmic, transmembrane, and luminal sequences and active site residues. Vertebrate ECEL1 genes usually contained 18 exons and 17 coding exons on the negative strand. Exons 1 and 2 of the human ECEL1 gene contained 5'-untranslated (5'-UTR regions, a large CpG island (CpG256, and several transcription factor binding sites which may contribute to the high levels of gene expression previously reported in neural tissues. Phylogenetic analyses examined the relationships and potential evolutionary origins of the vertebrate ECEL1 gene with six other vertebrate neutral endopeptidase M13 family genes. These suggested that ECEL1 originated in an ancestral vertebrate genome from a duplication event in an ancestral neutral endopeptidase M13-like gene.Keywords: vertebrates, amino acid sequence, ECEL1, ECE1, ECE2, KELL, NEP, NEPL1, PHEX
Novel classes of non-coding RNAs and cancer

Directory of Open Access Journals (Sweden)

Sana Jiri

2012-05-01

Full Text Available Abstract For the many years, the central dogma of molecular biology has been that RNA functions mainly as an informational intermediate between a DNA sequence and its encoded protein. But one of the great surprises of modern biology was the discovery that protein-coding genes represent less than 2% of the total genome sequence, and subsequently the fact that at least 90% of the human genome is actively transcribed. Thus, the human transcriptome was found to be more complex than a collection of protein-coding genes and their splice variants. Although initially argued to be spurious transcriptional noise or accumulated evolutionary debris arising from the early assembly of genes and/or the insertion of mobile genetic elements, recent evidence suggests that the non-coding RNAs (ncRNAs may play major biological roles in cellular development, physiology and pathologies. NcRNAs could be grouped into two major classes based on the transcript size; small ncRNAs and long ncRNAs. Each of these classes can be further divided, whereas novel subclasses are still being discovered and characterized. Although, in the last years, small ncRNAs called microRNAs were studied most frequently with more than ten thousand hits at PubMed database, recently, evidence has begun to accumulate describing the molecular mechanisms by which a wide range of novel RNA species function, providing insight into their functional roles in cellular biology and in human disease. In this review, we summarize newly discovered classes of ncRNAs, and highlight their functioning in cancer biology and potential usage as biomarkers or therapeutic targets.
Construction of a fusion gene containing hepatitis B virus L gene ...

African Journals Online (AJOL)

Jane

2011-10-05

Oct 5, 2011 ... the successful construction of a recombinant yeast expression vector containing gene coding L protein and Ag85B ..... the production of memory T cells, promote cytokine secretion and ... Dual DNA vaccination of rainbow trout.
De Novo Assembly and Genome Analyses of the Marine-Derived Scopulariopsis brevicaulis Strain LF580 Unravels Life-Style Traits and Anticancerous Scopularide Biosynthetic Gene Cluster.

Science.gov (United States)

Kumar, Abhishek; Henrissat, Bernard; Arvas, Mikko; Syed, Muhammad Fahad; Thieme, Nils; Benz, J Philipp; Sørensen, Jens Laurids; Record, Eric; Pöggeler, Stefanie; Kempken, Frank

2015-01-01

The marine-derived Scopulariopsis brevicaulis strain LF580 produces scopularides A and B, which have anticancerous properties. We carried out genome sequencing using three next-generation DNA sequencing methods. De novo hybrid assembly yielded 621 scaffolds with a total size of 32.2 Mb and 16298 putative gene models. We identified a large non-ribosomal peptide synthetase gene (nrps1) and supporting pks2 gene in the same biosynthetic gene cluster. This cluster and the genes within the cluster are functionally active as confirmed by RNA-Seq. Characterization of carbohydrate-active enzymes and major facilitator superfamily (MFS)-type transporters lead to postulate S. brevicaulis originated from a soil fungus, which came into contact with the marine sponge Tethya aurantium. This marine sponge seems to provide shelter to this fungus and micro-environment suitable for its survival in the ocean. This study also builds the platform for further investigations of the role of life-style and secondary metabolites from S. brevicaulis.
Amelogenesis Imperfecta; Genes, Proteins, and Pathways

Directory of Open Access Journals (Sweden)

Claire E. L. Smith

2017-06-01

Full Text Available Amelogenesis imperfecta (AI is the name given to a heterogeneous group of conditions characterized by inherited developmental enamel defects. AI enamel is abnormally thin, soft, fragile, pitted and/or badly discolored, with poor function and aesthetics, causing patients problems such as early tooth loss, severe embarrassment, eating difficulties, and pain. It was first described separately from diseases of dentine nearly 80 years ago, but the underlying genetic and mechanistic basis of the condition is only now coming to light. Mutations in the gene AMELX, encoding an extracellular matrix protein secreted by ameloblasts during enamel formation, were first identified as a cause of AI in 1991. Since then, mutations in at least eighteen genes have been shown to cause AI presenting in isolation of other health problems, with many more implicated in syndromic AI. Some of the encoded proteins have well documented roles in amelogenesis, acting as enamel matrix proteins or the proteases that degrade them, cell adhesion molecules or regulators of calcium homeostasis. However, for others, function is less clear and further research is needed to understand the pathways and processes essential for the development of healthy enamel. Here, we review the genes and mutations underlying AI presenting in isolation of other health problems, the proteins they encode and knowledge of their roles in amelogenesis, combining evidence from human phenotypes, inheritance patterns, mouse models, and in vitro studies. An LOVD resource (http://dna2.leeds.ac.uk/LOVD/ containing all published gene mutations for AI presenting in isolation of other health problems is described. We use this resource to identify trends in the genes and mutations reported to cause AI in the 270 families for which molecular diagnoses have been reported by 23rd May 2017. Finally we discuss the potential value of the translation of AI genetics to clinical care with improved patient pathways and
Amelogenesis Imperfecta; Genes, Proteins, and Pathways.

Science.gov (United States)

Smith, Claire E L; Poulter, James A; Antanaviciute, Agne; Kirkham, Jennifer; Brookes, Steven J; Inglehearn, Chris F; Mighell, Alan J

2017-01-01

Amelogenesis imperfecta (AI) is the name given to a heterogeneous group of conditions characterized by inherited developmental enamel defects. AI enamel is abnormally thin, soft, fragile, pitted and/or badly discolored, with poor function and aesthetics, causing patients problems such as early tooth loss, severe embarrassment, eating difficulties, and pain. It was first described separately from diseases of dentine nearly 80 years ago, but the underlying genetic and mechanistic basis of the condition is only now coming to light. Mutations in the gene AMELX , encoding an extracellular matrix protein secreted by ameloblasts during enamel formation, were first identified as a cause of AI in 1991. Since then, mutations in at least eighteen genes have been shown to cause AI presenting in isolation of other health problems, with many more implicated in syndromic AI. Some of the encoded proteins have well documented roles in amelogenesis, acting as enamel matrix proteins or the proteases that degrade them, cell adhesion molecules or regulators of calcium homeostasis. However, for others, function is less clear and further research is needed to understand the pathways and processes essential for the development of healthy enamel. Here, we review the genes and mutations underlying AI presenting in isolation of other health problems, the proteins they encode and knowledge of their roles in amelogenesis, combining evidence from human phenotypes, inheritance patterns, mouse models, and in vitro studies. An LOVD resource (http://dna2.leeds.ac.uk/LOVD/) containing all published gene mutations for AI presenting in isolation of other health problems is described. We use this resource to identify trends in the genes and mutations reported to cause AI in the 270 families for which molecular diagnoses have been reported by 23rd May 2017. Finally we discuss the potential value of the translation of AI genetics to clinical care with improved patient pathways and speculate on the
Investigation of genes coding for inflammatory components in Parkinson's disease.

Science.gov (United States)

Håkansson, Anna; Westberg, Lars; Nilsson, Staffan; Buervenich, Silvia; Carmine, Andrea; Holmberg, Björn; Sydow, Olof; Olson, Lars; Johnels, Bo; Eriksson, Elias; Nissbrandt, Hans

2005-05-01

Several findings obtained recently indicate that inflammation may contribute to the pathogenesis in Parkinson's disease (PD). Genetic variants of genes coding for components involved in immune reactions in the brain might therefore influence the risk of developing PD or the age of disease onset. Five single nucleotide polymorphisms (SNPs) in the genes coding for interferon-gamma (IFN-gamma; T874A in intron 1), interferon-gamma receptor 2 (IFN-gamma R2; Gln64Arg), interleukin-10 (IL-10; G1082A in the promoter region), platelet-activating factor acetylhydrolase (PAF-AH; Val379Ala), and intercellular adhesion molecule 1 (ICAM-1; Lys469Glu) were genotyped, using pyrosequencing, in 265 patients with PD and 308 controls. None of the investigated SNPs was found to be associated with PD; however, the G1082A polymorphism in the IL-10 gene promoter was found to be related to the age of disease onset. Linear regression showed a significantly earlier onset with more A-alleles (P = 0.0095; after Bonferroni correction, P = 0.048), resulting in a 5-year delayed age of onset of the disease for individuals having two G-alleles compared with individuals having two A-alleles. The results indicate that the IL-10 G1082A SNP could possibly be related to the age of onset of PD. Copyright 2005 Movement Disorder Society.
The completion of the Mammalian Gene Collection (MGC)

Science.gov (United States)

Temple, Gary; Gerhard, Daniela S.; Rasooly, Rebekah; Feingold, Elise A.; Good, Peter J.; Robinson, Cristen; Mandich, Allison; Derge, Jeffrey G.; Lewis, Jeanne; Shoaf, Debonny; Collins, Francis S.; Jang, Wonhee; Wagner, Lukas; Shenmen, Carolyn M.; Misquitta, Leonie; Schaefer, Carl F.; Buetow, Kenneth H.; Bonner, Tom I.; Yankie, Linda; Ward, Ming; Phan, Lon; Astashyn, Alex; Brown, Garth; Farrell, Catherine; Hart, Jennifer; Landrum, Melissa; Maidak, Bonnie L.; Murphy, Michael; Murphy, Terence; Rajput, Bhanu; Riddick, Lillian; Webb, David; Weber, Janet; Wu, Wendy; Pruitt, Kim D.; Maglott, Donna; Siepel, Adam; Brejova, Brona; Diekhans, Mark; Harte, Rachel; Baertsch, Robert; Kent, Jim; Haussler, David; Brent, Michael; Langton, Laura; Comstock, Charles L.G.; Stevens, Michael; Wei, Chaochun; van Baren, Marijke J.; Salehi-Ashtiani, Kourosh; Murray, Ryan R.; Ghamsari, Lila; Mello, Elizabeth; Lin, Chenwei; Pennacchio, Christa; Schreiber, Kirsten; Shapiro, Nicole; Marsh, Amber; Pardes, Elizabeth; Moore, Troy; Lebeau, Anita; Muratet, Mike; Simmons, Blake; Kloske, David; Sieja, Stephanie; Hudson, James; Sethupathy, Praveen; Brownstein, Michael; Bhat, Narayan; Lazar, Joseph; Jacob, Howard; Gruber, Chris E.; Smith, Mark R.; McPherson, John; Garcia, Angela M.; Gunaratne, Preethi H.; Wu, Jiaqian; Muzny, Donna; Gibbs, Richard A.; Young, Alice C.; Bouffard, Gerard G.; Blakesley, Robert W.; Mullikin, Jim; Green, Eric D.; Dickson, Mark C.; Rodriguez, Alex C.; Grimwood, Jane; Schmutz, Jeremy; Myers, Richard M.; Hirst, Martin; Zeng, Thomas; Tse, Kane; Moksa, Michelle; Deng, Merinda; Ma, Kevin; Mah, Diana; Pang, Johnson; Taylor, Greg; Chuah, Eric; Deng, Athena; Fichter, Keith; Go, Anne; Lee, Stephanie; Wang, Jing; Griffith, Malachi; Morin, Ryan; Moore, Richard A.; Mayo, Michael; Munro, Sarah; Wagner, Susan; Jones, Steven J.M.; Holt, Robert A.; Marra, Marco A.; Lu, Sun; Yang, Shuwei; Hartigan, James; Graf, Marcus; Wagner, Ralf; Letovksy, Stanley; Pulido, Jacqueline C.; Robison, Keith; Esposito, Dominic; Hartley, James; Wall, Vanessa E.; Hopkins, Ralph F.; Ohara, Osamu; Wiemann, Stefan

2009-01-01

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide. PMID:19767417
De novo assembly and characterization of the transcriptome of the parasitic weed dodder identifies genes associated with plant parasitism.

Science.gov (United States)

Ranjan, Aashish; Ichihashi, Yasunori; Farhi, Moran; Zumstein, Kristina; Townsley, Brad; David-Schwartz, Rakefet; Sinha, Neelima R

2014-11-01

Parasitic flowering plants are one of the most destructive agricultural pests and have major impact on crop yields throughout the world. Being dependent on finding a host plant for growth, parasitic plants penetrate their host using specialized organs called haustoria. Haustoria establish vascular connections with the host, which enable the parasite to steal nutrients and water. The underlying molecular and developmental basis of parasitism by plants is largely unknown. In order to investigate the process of parasitism, RNAs from different stages (i.e. seed, seedling, vegetative strand, prehaustoria, haustoria, and flower) were used to de novo assemble and annotate the transcriptome of the obligate plant stem parasite dodder (Cuscuta pentagona). The assembled transcriptome was used to dissect transcriptional dynamics during dodder development and parasitism and identified key gene categories involved in the process of plant parasitism. Host plant infection is accompanied by increased expression of parasite genes underlying transport and transporter categories, response to stress and stimuli, as well as genes encoding enzymes involved in cell wall modifications. By contrast, expression of photosynthetic genes is decreased in the dodder infective stages compared with normal stem. In addition, genes relating to biosynthesis, transport, and response of phytohormones, such as auxin, gibberellins, and strigolactone, were differentially expressed in the dodder infective stages compared with stems and seedlings. This analysis sheds light on the transcriptional changes that accompany plant parasitism and will aid in identifying potential gene targets for use in controlling the infestation of crops by parasitic weeds. © 2014 American Society of Plant Biologists. All Rights Reserved.

Developmental gene discovery in a hemimetabolous insect: de novo assembly and annotation of a transcriptome for the cricket Gryllus bimaculatus.

Directory of Open Access Journals (Sweden)

Victor Zeng

Full Text Available Most genomic resources available for insects represent the Holometabola, which are insects that undergo complete metamorphosis like beetles and flies. In contrast, the Hemimetabola (direct developing insects, representing the basal branches of the insect tree, have very few genomic resources. We have therefore created a large and publicly available transcriptome for the hemimetabolous insect Gryllus bimaculatus (cricket, a well-developed laboratory model organism whose potential for functional genetic experiments is currently limited by the absence of genomic resources. cDNA was prepared using mRNA obtained from adult ovaries containing all stages of oogenesis, and from embryo samples on each day of embryogenesis. Using 454 Titanium pyrosequencing, we sequenced over four million raw reads, and assembled them into 21,512 isotigs (predicted transcripts and 120,805 singletons with an average coverage per base pair of 51.3. We annotated the transcriptome manually for over 400 conserved genes involved in embryonic patterning, gametogenesis, and signaling pathways. BLAST comparison of the transcriptome against the NCBI non-redundant protein database (nr identified significant similarity to nr sequences for 55.5% of transcriptome sequences, and suggested that the transcriptome may contain 19,874 unique transcripts. For predicted transcripts without significant similarity to known sequences, we assessed their similarity to other orthopteran sequences, and determined that these transcripts contain recognizable protein domains, largely of unknown function. We created a searchable, web-based database to allow public access to all raw, assembled and annotated data. This database is to our knowledge the largest de novo assembled and annotated transcriptome resource available for any hemimetabolous insect. We therefore anticipate that these data will contribute significantly to more effective and higher-throughput deployment of molecular analysis tools in
Heterologous aggregates promote de novo prion appearance via more than one mechanism.

Directory of Open Access Journals (Sweden)

Fatih Arslan

2015-01-01

Full Text Available Prions are self-perpetuating conformational variants of particular proteins. In yeast, prions cause heritable phenotypic traits. Most known yeast prions contain a glutamine (Q/asparagine (N-rich region in their prion domains. [PSI+], the prion form of Sup35, appears de novo at dramatically enhanced rates following transient overproduction of Sup35 in the presence of [PIN+], the prion form of Rnq1. Here, we establish the temporal de novo appearance of Sup35 aggregates during such overexpression in relation to other cellular proteins. Fluorescently-labeled Sup35 initially forms one or a few dots when overexpressed in [PIN+] cells. One of the dots is perivacuolar, colocalizes with the aggregated Rnq1 dot and grows into peripheral rings/lines, some of which also colocalize with Rnq1. Sup35 dots that are not near the vacuole do not always colocalize with Rnq1 and disappear by the time rings start to grow. Bimolecular fluorescence complementation failed to detect any interaction between Sup35-VN and Rnq1-VC in [PSI+][PIN+] cells. In contrast, all Sup35 aggregates, whether newly induced or in established [PSI+], completely colocalize with the molecular chaperones Hsp104, Sis1, Ssa1 and eukaryotic release factor Sup45. In the absence of [PIN+], overexpressed aggregating proteins such as the Q/N-rich Pin4C or the non-Q/N-rich Mod5 can also promote the de novo appearance of [PSI+]. Similar to Rnq1, overexpressed Pin4C transiently colocalizes with newly appearing Sup35 aggregates. However, no interaction was detected between Mod5 and Sup35 during [PSI+] induction in the absence of [PIN+]. While the colocalization of Sup35 and aggregates of Rnq1 or Pin4C are consistent with the model that the heterologous aggregates cross-seed the de novo appearance of [PSI+], the lack of interaction between Mod5 and Sup35 leaves open the possibility of other mechanisms. We also show that Hsp104 is required in the de novo appearance of [PSI+] aggregates in a [PIN
Analysis of insecticide resistance-related genes of the Carmine spider mite Tetranychus cinnabarinus based on a de novo assembled transcriptome.

Science.gov (United States)

Xu, Zhifeng; Zhu, Wenyi; Liu, Yanchao; Liu, Xing; Chen, Qiushuang; Peng, Miao; Wang, Xiangzun; Shen, Guangmao; He, Lin

2014-01-01

The carmine spider mite (CSM), Tetranychus cinnabarinus, is an important pest mite in agriculture, because it can develop insecticide resistance easily. To gain valuable gene information and molecular basis for the future insecticide resistance study of CSM, the first transcriptome analysis of CSM was conducted. A total of 45,016 contigs and 25,519 unigenes were generated from the de novo transcriptome assembly, and 15,167 unigenes were annotated via BLAST querying against current databases, including nr, SwissProt, the Clusters of Orthologous Groups (COGs), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). Aligning the transcript to Tetranychus urticae genome, the 19255 (75.45%) of the transcripts had significant (e-value insecticide resistance in arthropod were generated from CSM transcriptome, including 53 P450-, 22 GSTs-, 23 CarEs-, 1 AChE-, 7 GluCls-, 9 nAChRs-, 8 GABA receptor-, 1 sodium channel-, 6 ATPase- and 12 Cyt b genes. We developed significant molecular resources for T. cinnabarinus putatively involved in insecticide resistance. The transcriptome assembly analysis will significantly facilitate our study on the mechanism of adapting environmental stress (including insecticide) in CSM at the molecular level, and will be very important for developing new control strategies against this pest mite.
A case of a Tunisian Rett patient with a novel double-mutation of the MECP2 gene

Energy Technology Data Exchange (ETDEWEB)

Fendri-Kriaa, Nourhene, E-mail: nourhene.fendri@gmail.com [Laboratoire de Genetique Moleculaire Humaine, Faculte de Medecine de Sfax, Universite de Sfax (Tunisia); Hsairi, Ines [Service de Neurologie Infantile, C.H.U. Hedi Chaker de Sfax (Tunisia); Kifagi, Chamseddine [Laboratoire internationale associe LIA135, Centre de Biotechnologie de Sfax (Tunisia); Ellouze, Emna [Service de Neurologie Infantile, C.H.U. Hedi Chaker de Sfax (Tunisia); Mkaouar-Rebai, Emna [Laboratoire de Genetique Moleculaire Humaine, Faculte de Medecine de Sfax, Universite de Sfax (Tunisia); Triki, Chahnez [Service de Neurologie Infantile, C.H.U. Hedi Chaker de Sfax (Tunisia); Fakhfakh, Faiza [Laboratoire de Genetique Moleculaire Humaine, Faculte de Medecine de Sfax, Universite de Sfax (Tunisia)

2011-06-03

Highlights: {yields} Sequencing of the MECP2 gene, modeling and comparison of the two variants were performed in a Tunisian classical Rett patient. {yields} A double-mutation: a new and de novo mutation c.535C > T and the common one c.763C > T of the MECP2 gene was identified. {yields} The P179S transition may change local electrostatic properties which may affect the function and stability of the protein MeCP2. -- Abstract: Rett syndrome is an X-linked dominant disorder caused frequently by mutations in the methyl-CpG-binding protein 2 gene (MECP2). Rett patients present an apparently normal psychomotor development during the first 6-18 months of life. Thereafter, they show a short period of developmental stagnation followed by a rapid regression in language and motor development. The aim of this study was to perform a mutational analysis of the MECP2 gene in a classical Rett patient by sequencing the corresponding gene and modeling the found variants. The results showed the presence of a double-mutation: a new and de novo mutation c.535C > T (p.P179S) and the common c.763C > T (p.R255X) transition of the MECP2 gene. The p.P179S mutation was located in a conserved amino acid in CRIR domain (corepressor interacting region). Modeling results showed that the P179S transition could change local electrostatic properties by adding a negative charge due to serine hydroxyl group of this region of MeCP2 which may affect the function and stability of the protein. The p.R255X mutation is located in TRD-NLS domain (transcription repression domain-nuclear localization signal) of MeCP2 protein.
A case of a Tunisian Rett patient with a novel double-mutation of the MECP2 gene

International Nuclear Information System (INIS)

Fendri-Kriaa, Nourhene; Hsairi, Ines; Kifagi, Chamseddine; Ellouze, Emna; Mkaouar-Rebai, Emna; Triki, Chahnez; Fakhfakh, Faiza

2011-01-01

Highlights: → Sequencing of the MECP2 gene, modeling and comparison of the two variants were performed in a Tunisian classical Rett patient. → A double-mutation: a new and de novo mutation c.535C > T and the common one c.763C > T of the MECP2 gene was identified. → The P179S transition may change local electrostatic properties which may affect the function and stability of the protein MeCP2. -- Abstract: Rett syndrome is an X-linked dominant disorder caused frequently by mutations in the methyl-CpG-binding protein 2 gene (MECP2). Rett patients present an apparently normal psychomotor development during the first 6-18 months of life. Thereafter, they show a short period of developmental stagnation followed by a rapid regression in language and motor development. The aim of this study was to perform a mutational analysis of the MECP2 gene in a classical Rett patient by sequencing the corresponding gene and modeling the found variants. The results showed the presence of a double-mutation: a new and de novo mutation c.535C > T (p.P179S) and the common c.763C > T (p.R255X) transition of the MECP2 gene. The p.P179S mutation was located in a conserved amino acid in CRIR domain (corepressor interacting region). Modeling results showed that the P179S transition could change local electrostatic properties by adding a negative charge due to serine hydroxyl group of this region of MeCP2 which may affect the function and stability of the protein. The p.R255X mutation is located in TRD-NLS domain (transcription repression domain-nuclear localization signal) of MeCP2 protein.
Cloning and characterization of the fatty acid-binding protein gene from the protoscolex of Taenia multiceps.

Science.gov (United States)

Nie, Hua-Ming; Xie, Yue; Fu, Yan; Yang, Ying-Dong; Gu, Xiao-Bin; Wang, Shu-Xian; Peng, Xi; Lai, Wei-Ming; Peng, Xue-Rong; Yang, Guang-You

2013-05-01

Taenia multiceps (Cestoda: Taeniidae), a worldwide cestode parasite, is emerging as an important helminthic zoonosis due to serious or fatal central nervous system disease commonly known as coenurosis in domestic and wild ruminants including humans. Herein, a fatty acid-binding protein (FABP) gene was identified from transcriptomic data in T. multiceps. This gene, which contains a complete coding sequence, was amplified by reverse-transcriptase polymerase chain reaction. The corresponding protein, which was named TmFABP, had a molecular weight of 14 kDa, and subsequently was recombinantly expressed in Escherichia coli. The fusion protein was purified on Ni-NTA beads (Bio-Rad). Sodium dodecyl sulfate-polyacrylamide gel electrophoresis and Western blot analyses showed that the purified recombinant protein caused immunogenicity. Immunohistochemical studies showed that TmFABP was expressed at the tegumental level in the protoscolices and in the cells between the body wall and parenchyma layer of the cestode. In sections from gravid proglottids, intense staining was detected in the uterus and eggs. Based on this, TmFABP could be switched on during differentiation of germinative layers to protoscoleces and from metacestodes to adult worms. Taken together, our results already reported for T. multiceps suggest the possibility of TmFABP developing a vaccine to control and prevent coenurosis.
Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach.

Directory of Open Access Journals (Sweden)

Jing Zhao

Full Text Available Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.
The human protein disulfide isomerase gene family

Directory of Open Access Journals (Sweden)

Galligan James J

2012-07-01

Full Text Available Abstract Enzyme-mediated disulfide bond formation is a highly conserved process affecting over one-third of all eukaryotic proteins. The enzymes primarily responsible for facilitating thiol-disulfide exchange are members of an expanding family of proteins known as protein disulfide isomerases (PDIs. These proteins are part of a larger superfamily of proteins known as the thioredoxin protein family (TRX. As members of the PDI family of proteins, all proteins contain a TRX-like structural domain and are predominantly expressed in the endoplasmic reticulum. Subcellular localization and the presence of a TRX domain, however, comprise the short list of distinguishing features required for gene family classification. To date, the PDI gene family contains 21 members, varying in domain composition, molecular weight, tissue expression, and cellular processing. Given their vital role in protein-folding, loss of PDI activity has been associated with the pathogenesis of numerous disease states, most commonly related to the unfolded protein response (UPR. Over the past decade, UPR has become a very attractive therapeutic target for multiple pathologies including Alzheimer disease, Parkinson disease, alcoholic and non-alcoholic liver disease, and type-2 diabetes. Understanding the mechanisms of protein-folding, specifically thiol-disulfide exchange, may lead to development of a novel class of therapeutics that would help alleviate a wide range of diseases by targeting the UPR.
Annotation of a hybrid partial genome of the Coffee Rust (Hemileia vastatrix contributes to the gene repertoire catalogue of the Pucciniales

Directory of Open Access Journals (Sweden)

Marco Aurelio Cristancho

2014-10-01

Full Text Available Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333Mb was built based on the 8 isolates; this assembly was used for subsequent analyses.Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3,921 families were uncovered; a considerable proportion of the predicted proteins (73.8% were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish
Analysis of viral protein-2 encoding gene of avian encephalomyelitis virus from field specimens in Central Java region, Indonesia

Directory of Open Access Journals (Sweden)

Aris Haryanto

2016-01-01

Full Text Available Aim: Avian encephalomyelitis (AE is a viral disease which can infect various types of poultry, especially chicken. In Indonesia, the incidence of AE infection in chicken has been reported since 2009, the AE incidence tends to increase from year to year. The objective of this study was to analyze viral protein 2 (VP-2 encoding gene of AE virus (AEV from various species of birds in field specimen by reverse transcription polymerase chain reaction (RT-PCR amplification using specific nucleotides primer for confirmation of AE diagnosis. Materials and Methods: A total of 13 AEV samples are isolated from various species of poultry which are serologically diagnosed infected by AEV from some areas in central Java, Indonesia. Research stage consists of virus samples collection from field specimens, extraction of AEV RNA, amplification of VP-2 protein encoding gene by RT-PCR, separation of RT-PCR product by agarose gel electrophoresis, DNA sequencing and data analysis. Results: Amplification products of the VP-2 encoding gene of AEV by RT-PCR methods of various types of poultry from field specimens showed a positive results on sample code 499/4/12 which generated DNA fragment in the size of 619 bp. Sensitivity test of RT-PCR amplification showed that the minimum concentration of RNA template is 127.75 ng/μl. The multiple alignments of DNA sequencing product indicated that positive sample with code 499/4/12 has 92% nucleotide homology compared with AEV with accession number AV1775/07 and 85% nucleotide homology with accession number ZCHP2/0912695 from Genbank database. Analysis of VP-2 gene sequence showed that it found 46 nucleotides difference between isolate 499/4/12 compared with accession number AV1775/07 and 93 nucleotides different with accession number ZCHP2/0912695. Conclusions: Analyses of the VP-2 encoding gene of AEV with RT-PCR method from 13 samples from field specimen generated the DNA fragment in the size of 619 bp from one sample with
ProClaT, a new bioinformatics tool for in silico protein reclassification: case study of DraB, a protein coded from the draTGB operon in Azospirillum brasilense.

Science.gov (United States)

Rubel, Elisa Terumi; Raittz, Roberto Tadeu; Coimbra, Nilson Antonio da Rocha; Gehlen, Michelly Alves Coutinho; Pedrosa, Fábio de Oliveira

2016-12-15

Azopirillum brasilense is a plant-growth promoting nitrogen-fixing bacteria that is used as bio-fertilizer in agriculture. Since nitrogen fixation has a high-energy demand, the reduction of N 2 to NH 4 + by nitrogenase occurs only under limiting conditions of NH 4 + and O 2 . Moreover, the synthesis and activity of nitrogenase is highly regulated to prevent energy waste. In A. brasilense nitrogenase activity is regulated by the products of draG and draT. The product of the draB gene, located downstream in the draTGB operon, may be involved in the regulation of nitrogenase activity by an, as yet, unknown mechanism. A deep in silico analysis of the product of draB was undertaken aiming at suggesting its possible function and involvement with DraT and DraG in the regulation of nitrogenase activity in A. brasilense. In this work, we present a new artificial intelligence strategy for protein classification, named ProClaT. The features used by the pattern recognition model were derived from the primary structure of the DraB homologous proteins, calculated by a ProClaT internal algorithm. ProClaT was applied to this case study and the results revealed that the A. brasilense draB gene codes for a protein highly similar to the nitrogenase associated NifO protein of Azotobacter vinelandii. This tool allowed the reclassification of DraB/NifO homologous proteins, hypothetical, conserved hypothetical and those annotated as putative arsenate reductase, ArsC, as NifO-like. An analysis of co-occurrence of draB, draT, draG and of other nif genes was performed, suggesting the involvement of draB (nifO) in nitrogen fixation, however, without the definition of a specific function.
The Drosophila genes CG14593 and CG30106 code for G-protein-coupled receptors specifically activated by the neuropeptides CCHamide-1 and CCHamide-2

DEFF Research Database (Denmark)

Hansen, Karina K; Hauser, Frank; Williamson, Michael

2011-01-01

Recently, a novel neuropeptide, CCHamide, was discovered in the silkworm Bombyx mori (L. Roller et al., Insect Biochem. Mol. Biol. 38 (2008) 1147-1157). We have now found that all insects with a sequenced genome have two genes, each coding for a different CCHamide, CCHamide-1 and -2. We have also...
AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs.

Science.gov (United States)

Jiang, Biaobin; Kloster, Kyle; Gleich, David F; Gribskov, Michael

2017-06-15

Diffusion-based network models are widely used for protein function prediction using protein network data and have been shown to outperform neighborhood-based and module-based methods. Recent studies have shown that integrating the hierarchical structure of the Gene Ontology (GO) data dramatically improves prediction accuracy. However, previous methods usually either used the GO hierarchy to refine the prediction results of multiple classifiers, or flattened the hierarchy into a function-function similarity kernel. No study has taken the GO hierarchy into account together with the protein network as a two-layer network model. We first construct a Bi-relational graph (Birg) model comprised of both protein-protein association and function-function hierarchical networks. We then propose two diffusion-based methods, BirgRank and AptRank, both of which use PageRank to diffuse information on this two-layer graph model. BirgRank is a direct application of traditional PageRank with fixed decay parameters. In contrast, AptRank utilizes an adaptive diffusion mechanism to improve the performance of BirgRank. We evaluate the ability of both methods to predict protein function on yeast, fly and human protein datasets, and compare with four previous methods: GeneMANIA, TMC, ProteinRank and clusDCA. We design four different validation strategies: missing function prediction, de novo function prediction, guided function prediction and newly discovered function prediction to comprehensively evaluate predictability of all six methods. We find that both BirgRank and AptRank outperform the previous methods, especially in missing function prediction when using only 10% of the data for training. The MATLAB code is available at https://github.rcac.purdue.edu/mgribsko/aptrank . gribskov@purdue.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Methods for Using Small Non-Coding RNAs to Improve Recombinant Protein Expression in Mammalian Cells

Directory of Open Access Journals (Sweden)

Sarah Inwood

2018-01-01

Full Text Available The ability to produce recombinant proteins by utilizing different “cell factories” revolutionized the biotherapeutic and pharmaceutical industry. Chinese hamster ovary (CHO cells are the dominant industrial producer, especially for antibodies. Human embryonic kidney cells (HEK, while not being as widely used as CHO cells, are used where CHO cells are unable to meet the needs for expression, such as growth factors. Therefore, improving recombinant protein expression from mammalian cells is a priority, and continuing effort is being devoted to this topic. Non-coding RNAs are RNA segments that are not translated into a protein and often have a regulatory role. Since their discovery, major progress has been made towards understanding their functions. Non-coding RNA has been investigated extensively in relation to disease, especially cancer, and recently they have also been used as a method for engineering cells to improve their protein expression capability. In this review, we provide information about methods used to identify non-coding RNAs with the potential of improving recombinant protein expression in mammalian cell lines.
Transferência de fatores genéticos de resistência a Hemileia vastatrix para o cultivar mundo novo Transference of the genes SH2 and SH3 for resistance to Hemileia vastatrix to the mundo novo cultivar of C. arabica

Directory of Open Access Journals (Sweden)

A. Carvalho

1977-01-01

Full Text Available Cafeeiros portadores dos fatores genéticos SH2 ou SH2 e SH3, simultaneamente, que conferem resistência a várias raças de Hemileia vastatrix, foram cruzados com plantas selecionadas do cultivar mundo novo de Coffea arabica a fim de se obter, em F2, recombinações com resistência a esse patógeno e elevada produtividade. Analisaram-se 14 populações F2 segregando apenas para o fator SH2, oito para os fatores SH2 e HS3, e três populações que dão, em sua descendência, plantas do grupo A, resistentes a todas as raças do patógeno até agora conhecidas. De 22.356 cafeeiros originalmente plantados em ensaio, a duas mudas por cova, em parcelas casualizadas, fez-se uma primeira seleção deixando apenas um cafeeiro por cova, reduzindo-se para 11.178 as plantas em estudo. Com base no aspecto vegetativo, na produtividade, na ausência de defeitos nos frutos e na reação de resistência ao agente causal da ferrugem, realizaram-se sucessivas seleções escolhendo-se finalmente, apenas 100 cafeeiros do tipo mundo novo e resistentes a H. vastatrix para derivação das populações F2 e prosseguimento da seleção.Coffee trees homozygous for the alleles SH2 or SH2 and SH3 which confer resistance to several physiological races of Hemileia vastatrix, were crossed to selected plants of Mundo Novo cultivar of Coffea arabica and the F2 generations were studied aiming to develop new high yielding and resistant coffee recombinations. A complete randomized field trial was stablished including 14 F2 populations segregating for SH2, eight populations segregating for SH2 and SH3 genes, and three populations segregating for plants of the A group of reaction to the H. vastatrix attack. A total of 22,356 F2 plants were analysed. Based on the plant vigor, yield capacity, percentage of normal developed seeds and resistance reaction to H. vastatrix, three successive series of selection were undertaken leaving only 100 coffee trees for development of F3 populations
Axonal regeneration and development of de novo axons from distal dendrites of adult feline commissural interneurons after a proximal axotomy

DEFF Research Database (Denmark)

Fenrich, Keith K; Skelton, Nicole; MacDermid, Victoria E

2007-01-01

Following proximal axotomy, several types of neurons sprout de novo axons from distal dendrites. These processes may represent a means of forming new circuits following spinal cord injury. However, it is not know whether mammalian spinal interneurons, axotomized as a result of a spinal cord injury......, develop de novo axons. Our goal was to determine whether spinal commissural interneurons (CINs), axotomized by 3-4-mm midsagittal transection at C3, form de novo axons from distal dendrites. All experiments were performed on adult cats. CINs in C3 were stained with extracellular injections of Neurobiotin...... at 4-5 weeks post injury. The somata of axotomized CINs were identified by the presence of immunoreactivity for the axonal growth-associated protein-43 (GAP-43). Nearly half of the CINs had de novo axons that emerged from distal dendrites. These axons lacked immunoreactivity for the dendritic protein...
A de novo missense mutation of FGFR2 causes facial dysplasia syndrome in Holstein cattle

DEFF Research Database (Denmark)

Agerholm, Jørgen Steen; McEvoy, Fintan; Heegaard, Steffen

2017-01-01

was suspected as all recorded cases were progeny of the same sire. Detailed investigations were performed to characterize the syndrome and to reveal its cause. Results Seven malformed calves were submitted examination. All cases shared a common morphology with the most striking lesions being severe facial...... chromosome 26 where whole genome sequencing of a case-parent trio revealed two de novo variants perfectly associated with the disease: an intronic SNP in the DMBT1 gene and a single non-synonymous variant in the FGFR2 gene. This FGFR2 missense variant (c.927G>T) affects a gene encoding a member...... of the fibroblast growth factor receptor family, where amino acid sequence is highly conserved between members and across species. It is predicted to change an evolutionary conserved tryptophan into a cysteine residue (p.Trp309Cys). Both variant alleles were proven to result from de novo mutation events...
Influence of the Leader protein coding region of foot-and-mouth disease virus on virus replication

DEFF Research Database (Denmark)

Belsham, Graham

2013-01-01

The foot-and-mouth disease virus (FMDV) Leader (L) protein is produced in two forms, Lab and Lb, differing only at their amino-termini, due to the use of separate initiation codons, usually 84 nt apart. It has been shown previously, and confirmed here, that precise deletion of the Lab coding......, in the context of the virus lacking the Lb coding region, was also tolerated by the virus within BHK cells. However, precise loss of the Lb coding sequence alone blocked FMDV replication in primary bovine thyroid cells. Thus, the requirement for the Leader protein coding sequences is highly dependent...... on the nature and extent of the residual Leader protein sequences and on the host cell system used. FMDVs precisely lacking Lb and with the Lab initiation codon modified may represent safer seed viruses for vaccine production....
Structural Insight into the Core of CAD, the Multifunctional Protein Leading De Novo Pyrimidine Biosynthesis.

Science.gov (United States)

Moreno-Morcillo, María; Grande-García, Araceli; Ruiz-Ramos, Alba; Del Caño-Ochoa, Francisco; Boskovic, Jasminka; Ramón-Maiques, Santiago

2017-06-06

CAD, the multifunctional protein initiating and controlling de novo biosynthesis of pyrimidines in animals, self-assembles into ∼1.5 MDa hexamers. The structures of the dihydroorotase (DHO) and aspartate transcarbamoylase (ATC) domains of human CAD have been previously determined, but we lack information on how these domains associate and interact with the rest of CAD forming a multienzymatic unit. Here, we prove that a construct covering human DHO and ATC oligomerizes as a dimer of trimers and that this arrangement is conserved in CAD-like from fungi, which holds an inactive DHO-like domain. The crystal structures of the ATC trimer and DHO-like dimer from the fungus Chaetomium thermophilum confirm the similarity with the human CAD homologs. These results demonstrate that, despite being inactive, the fungal DHO-like domain has a conserved structural function. We propose a model that sets the DHO and ATC complex as the central element in the architecture of CAD. Copyright © 2017 Elsevier Ltd. All rights reserved.
Functional specialization of one copy of glutamine phosphoribosyl pyrophosphate amidotransferase in ureide production from symbiotically fixed nitrogen in Phaseolus vulgaris.

Science.gov (United States)

Coleto, Inmaculada; Trenas, Almudena T; Erban, Alexander; Kopka, Joachim; Pineda, Manuel; Alamillo, Josefa M

2016-08-01

Purines are essential molecules formed in a highly regulated pathway in all organisms. In tropical legumes, the nitrogen fixed in the nodules is used to generate ureides through the oxidation of de novo synthesized purines. Glutamine phosphoribosyl pyrophosphate amidotransferase (PRAT) catalyses the first committed step of de novo purine synthesis. In Phaseolus vulgaris there are three genes coding for PRAT. The three full-length sequences, which are intron-less genes, were cloned, and their expression levels were determined under conditions that affect the synthesis of purines. One of the three genes, PvPRAT3, is highly expressed in nodules and protein amount and enzymatic activity in these tissues correlate with nitrogen fixation activity. Inhibition of PvPRAT3 gene expression by RNAi-silencing and subsequent metabolomic analysis of the transformed roots shows that PvPRAT3 is essential for the synthesis of ureides in P. vulgaris nodules. © 2016 John Wiley & Sons Ltd.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.