WorldWideScience

Sample records for functional sequence motifs

  1. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  2. A structural study for the optimisation of functional motifs encoded in protein sequences

    Directory of Open Access Journals (Sweden)

    Helmer-Citterich Manuela

    2004-04-01

    Full Text Available Abstract Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases, the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of

  3. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore...... advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery....... a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs...

  4. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  5. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    Science.gov (United States)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  6. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  7. CompariMotif: quick and easy comparisons of sequence motifs.

    Science.gov (United States)

    Edwards, Richard J; Davey, Norman E; Shields, Denis C

    2008-05-15

    CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/

  8. Deciphering functional glycosaminoglycan motifs in development.

    Science.gov (United States)

    Townley, Robert A; Bülow, Hannes E

    2018-03-23

    Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  11. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  12. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...

  13. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  14. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias.

    Science.gov (United States)

    Kjær, Jonas; Belsham, Graham J

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long), which induces a nonproteolytic, cotranslational "cleavage" at its own C terminus. A conserved feature among variants of 2A is the C-terminal motif N 16 P 17 G 18 /P 19 , where P 19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E 14 , S 15 , and N 16 within the 2A sequence of infectious FMDVs, but no variants at residues P 17 , G 18 , or P 19 have been identified. In this study, using highly degenerate primers, we analyzed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after two, three, or four passages. However, surprisingly, a clear codon preference for the wt nucleotide sequence encoding the NPGP motif within these viruses was observed. Indeed, the codons selected to code for P 17 and P 19 within this motif were distinct; thus the synonymous codons are not equivalent. © 2018 Kjær and Belsham; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  15. The identification of functional motifs in temporal gene expression analysis

    Directory of Open Access Journals (Sweden)

    Michael G. Surette

    2005-01-01

    Full Text Available The identification of transcription factor binding sites is essential to the understanding of the regulation of gene expression and the reconstruction of genetic regulatory networks. The in silico identification of cis-regulatory motifs is challenging due to sequence variability and lack of sufficient data to generate consensus motifs that are of quantitative or even qualitative predictive value. To determine functional motifs in gene expression, we propose a strategy to adopt false discovery rate (FDR and estimate motif effects to evaluate combinatorial analysis of motif candidates and temporal gene expression data. The method decreases the number of predicted motifs, which can then be confirmed by genetic analysis. To assess the method we used simulated motif/expression data to evaluate parameters. We applied this approach to experimental data for a group of iron responsive genes in Salmonella typhimurium 14028S. The method identified known and potentially new ferric-uptake regulator (Fur binding sites. In addition, we identified uncharacterized functional motif candidates that correlated with specific patterns of expression. A SAS code for the simulation and analysis gene expression data is available from the first author upon request.

  16. Identification of sequence motifs significantly associated with antisense activity

    Directory of Open Access Journals (Sweden)

    Peek Andrew S

    2007-06-01

    Full Text Available Abstract Background Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. Results We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. Conclusion The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic

  17. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias

    DEFF Research Database (Denmark)

    Kjær, Jonas; Belsham, Graham J.

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long) which induces a non-proteolytic, co-translational, "cleavage" at its own C......-terminus. A conserved feature among variants of 2A is the C-terminal motif N16P17G18/P19 where P19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E14, S15 and N16 within the 2A sequence of infectious FMDVs but no variants at residues P17, G18...... or P19 have been identified. In this study, using highly degenerate primers, we analysed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after 2, 3 or 4 passages. However...

  18. Perception Enhancement using Visual Attributes in Sequence Motif Visualization

    OpenAIRE

    Oon, Yin; Lee, Nung; Kok, Wei

    2016-01-01

    Sequence logo is a well-accepted scientific method to visualize the conservation characteristics of biological sequence motifs. Previous studies found that using sequence logo graphical representation for scientific evidence reports or arguments could seriously cause biases and misinterpretation by users. This study investigates on the visual attributes performance of a sequence logo in helping users to perceive and interpret the information based on preattentive theories and Gestalt principl...

  19. Identification of a Baeyer-Villiger monooxygenase sequence motif

    NARCIS (Netherlands)

    Fraaije, MW; Kamerbeek, NM; van Berkel, WJH; Janssen, DB; Kamerbeek, Nanne M.; Berkel, Willem J.H. van

    2002-01-01

    Baeyer-Villiger monooxygenases (BVMOs) form a distinct class of flavoproteins that catalyze the insertion of an oxygen atom in a C-C bond using dioxygen and NAD(P)H. Using newly characterized BVMO sequences, we have uncovered a BVMO-identifying sequence motif: FXGXXXRXXXW(P/D). Studies with

  20. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  1. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  2. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  3. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  4. Structural fragment clustering reveals novel structural and functional motifs in α-helical transmembrane proteins

    Directory of Open Access Journals (Sweden)

    Vassilev Boris

    2010-04-01

    Full Text Available Abstract Background A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology. Results We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to α-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94% appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1 a dimer interface motif found in voltage-gated chloride channels, (2 a proton transfer motif found in heme-copper oxidases, and (3 a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b. Conclusions Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

  5. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  6. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2011-06-01

    Full Text Available Abstract Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet, which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i ubiquitous motifs, shared by several superfamilies and (ii superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  7. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified...

  8. Targeting functional motifs of a protein family

    Science.gov (United States)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  9. Identity and functions of CxxC-derived motifs.

    Science.gov (United States)

    Fomenko, Dmitri E; Gladyshev, Vadim N

    2003-09-30

    Two cysteines separated by two other residues (the CxxC motif) are employed by many redox proteins for formation, isomerization, and reduction of disulfide bonds and for other redox functions. The place of the C-terminal cysteine in this motif may be occupied by serine (the CxxS motif), modifying the functional repertoire of redox proteins. Here we found that the CxxC motif may also give rise to a motif, in which the C-terminal cysteine is replaced with threonine (the CxxT motif). Moreover, in contrast to a view that the N-terminal cysteine in the CxxC motif always serves as a nucleophilic attacking group, this residue could also be replaced with threonine (the TxxC motif), serine (the SxxC motif), or other residues. In each of these CxxC-derived motifs, the presence of a downstream alpha-helix was strongly favored. A search for conserved CxxC-derived motif/helix patterns in four complete genomes representing bacteria, archaea, and eukaryotes identified known redox proteins and suggested possible redox functions for several additional proteins. Catalytic sites in peroxiredoxins were major representatives of the TxxC motif, whereas those in glutathione peroxidases represented the CxxT motif. Structural assessments indicated that threonines in these enzymes could stabilize catalytic thiolates, suggesting revisions to previously proposed catalytic triads. Each of the CxxC-derived motifs was also observed in natural selenium-containing proteins, in which selenocysteine was present in place of a catalytic cysteine.

  10. Anion induced conformational preference of Cα NN motif residues in functional proteins.

    Science.gov (United States)

    Patra, Piya; Ghosh, Mahua; Banerjee, Raja; Chakrabarti, Jaydeb

    2017-12-01

    Among different ligand binding motifs, anion binding C α NN motif consisting of peptide backbone atoms of three consecutive residues are observed to be important for recognition of free anions, like sulphate or biphosphate and participate in different key functions. Here we study the interaction of sulphate and biphosphate with C α NN motif present in different proteins. Instead of total protein, a peptide fragment has been studied keeping C α NN motif flanked in between other residues. We use classical force field based molecular dynamics simulations to understand the stability of this motif. Our data indicate fluctuations in conformational preferences of the motif residues in absence of the anion. The anion gives stability to one of these conformations. However, the anion induced conformational preferences are highly sequence dependent and specific to the type of anion. In particular, the polar residues are more favourable compared to the other residues for recognising the anion. © 2017 Wiley Periodicals, Inc.

  11. Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

    Science.gov (United States)

    Kinjo, Akira R.; Nakamura, Haruki

    2012-01-01

    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478

  12. Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

    KAUST Repository

    Alam, Tanvir

    2018-03-11

    Short Linear Motifs (SLiMs) contribute to almost every cellular function by connecting appropriate protein partners. Accurate prediction of SLiMs is difficult due to their shortness and sequence degeneracy. Leucine-aspartic acid (LD) motifs are SLiMs that link paxillin family proteins to factors controlling (cancer) cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. To enable a proteome-wide assessment of these motifs, we developed an active-learning based framework that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome identified a dozen proteins that contain LD motifs, all being involved in cell adhesion and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter-species comparison revealed a conserved LD signalling core, and reveals the emergence of species-specific adaptive connections, while maintaining a strong functional focus of the LD motif interactome. Collectively, our data elucidate the mechanisms underlying the origin and adaptation of an ancestral SLiM.

  13. Purification and functional motifs of the recombinant ATPase of orf virus.

    Science.gov (United States)

    Lin, Fong-Yuan; Chan, Kun-Wei; Wang, Chi-Young; Wong, Min-Liang; Hsu, Wei-Li

    2011-10-01

    Our previous study showed that the recombinant ATPase encoded by the A32L gene of orf virus displayed ATP hydrolysis activity as predicted from its amino acids sequence. This viral ATPase contains four known functional motifs (motifs I-IV) and a novel AYDG motif; they are essential for ATP hydrolysis reaction by binding ATP and magnesium ions. The motifs I and II correspond with the Walker A and B motifs of the typical ATPase, respectively. To examine the biochemical roles of these five conserved motifs, recombinant ATPases of five deletion mutants derived from the Taiping strain were expressed and purified. Their ATPase functions were assayed and compared with those of two wild type strains, Taiping and Nantou isolated in Taiwan. Our results showed that deletions at motifs I-III or IV exhibited lower activity than that of the wild type. Interestingly, deletion of AYDG motif decreased the ATPase activity more significantly than those of motifs I-IV deletions. Divalent ions such as magnesium and calcium were essential for ATPase activity. Moreover, our recombinant proteins of orf virus also demonstrated GTPase activity, though weaker than the original ATPase activity. Copyright © 2011 Elsevier Inc. All rights reserved.

  14. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction.

    Directory of Open Access Journals (Sweden)

    Aalt D J van Dijk

    Full Text Available Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and

  15. qPMS7: a fast algorithm for finding (ℓ, d-motifs in DNA and protein sequences.

    Directory of Open Access Journals (Sweden)

    Hieu Dinh

    Full Text Available Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d-motif search (or Planted Motif Search (PMS. A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS, is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.

  16. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  17. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  18. Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

    Science.gov (United States)

    Shan, Gao; Zheng, Wei-Mou

    2009-02-01

    By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.

  19. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    OpenAIRE

    Bolton, Michael J; Garry, Robert F

    2011-01-01

    Abstract Background The HIV surface glycoprotein gp120 (SU, gp120) and the Plasmodium vivax Duffy binding protein (PvDBP) bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM). Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infectio...

  20. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  1. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  2. Memetic algorithms for de novo motif-finding in biomedical sequences.

    Science.gov (United States)

    Bi, Chengpeng

    2012-09-01

    The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary micro

  3. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  4. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    Science.gov (United States)

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  5. Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

    KAUST Repository

    Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian; Huser, Raphaë l; Ali, Amal J.; Merzaban, Jasmeen; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T.

    2018-01-01

    and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter

  6. An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs

    OpenAIRE

    Chang, Tzu-Hao; Huang, Hsi-Yuan; Hsu, Justin Bo-Kai; Weng, Shun-Long; Horng, Jorng-Tzong; Huang, Hsien-Da

    2013-01-01

    Background Functional RNA molecules participate in numerous biological processes, ranging from gene regulation to protein synthesis. Analysis of functional RNA motifs and elements in RNA sequences can obtain useful information for deciphering RNA regulatory mechanisms. Our previous work, RegRNA, is widely used in the identification of regulatory motifs, and this work extends it by incorporating more comprehensive and updated data sources and analytical approaches into a new platform. Methods ...

  7. Distance-dependent duplex DNA destabilization proximal to G-quadruplex/i-motif sequences

    Science.gov (United States)

    König, Sebastian L. B.; Huppert, Julian L.; Sigel, Roland K. O.; Evans, Amanda C.

    2013-01-01

    G-quadruplexes and i-motifs are complementary examples of non-canonical nucleic acid substructure conformations. G-quadruplex thermodynamic stability has been extensively studied for a variety of base sequences, but the degree of duplex destabilization that adjacent quadruplex structure formation can cause has yet to be fully addressed. Stable in vivo formation of these alternative nucleic acid structures is likely to be highly dependent on whether sufficient spacing exists between neighbouring duplex- and quadruplex-/i-motif-forming regions to accommodate quadruplexes or i-motifs without disrupting duplex stability. Prediction of putative G-quadruplex-forming regions is likely to be assisted by further understanding of what distance (number of base pairs) is required for duplexes to remain stable as quadruplexes or i-motifs form. Using oligonucleotide constructs derived from precedented G-quadruplexes and i-motif-forming bcl-2 P1 promoter region, initial biophysical stability studies indicate that the formation of G-quadruplex and i-motif conformations do destabilize proximal duplex regions. The undermining effect that quadruplex formation can have on duplex stability is mitigated with increased distance from the duplex region: a spacing of five base pairs or more is sufficient to maintain duplex stability proximal to predicted quadruplex/i-motif-forming regions. PMID:23771141

  8. PDL1 Signals through Conserved Sequence Motifs to Overcome Interferon-Mediated Cytotoxicity

    Directory of Open Access Journals (Sweden)

    Maria Gato-Cañas

    2017-08-01

    Full Text Available PDL1 blockade produces remarkable clinical responses, thought to occur by T cell reactivation through prevention of PDL1-PD1 T cell inhibitory interactions. Here, we find that PDL1 cell-intrinsic signaling protects cancer cells from interferon (IFN cytotoxicity and accelerates tumor progression. PDL1 inhibited IFN signal transduction through a conserved class of sequence motifs that mediate crosstalk with IFN signaling. Abrogation of PDL1 expression or antibody-mediated PDL1 blockade strongly sensitized cancer cells to IFN cytotoxicity through a STAT3/caspase-7-dependent pathway. Moreover, somatic mutations found in human carcinomas within these PDL1 sequence motifs disrupted motif regulation, resulting in PDL1 molecules with enhanced protective activities from type I and type II IFN cytotoxicity. Overall, our results reveal a mode of action of PDL1 in cancer cells as a first line of defense against IFN cytotoxicity.

  9. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  10. Structural and Functional Motifs in Influenza Virus RNAs

    Directory of Open Access Journals (Sweden)

    Damien Ferhadian

    2018-03-01

    Full Text Available Influenza A viruses (IAV are responsible for recurrent influenza epidemics and occasional devastating pandemics in humans and animals. They belong to the Orthomyxoviridae family and their genome consists of eight (- sense viral RNA (vRNA segments of different lengths coding for at least 11 viral proteins. A heterotrimeric polymerase complex is bound to the promoter consisting of the 13 5′-terminal and 12 3′-terminal nucleotides of each vRNA, while internal parts of the vRNAs are associated with multiple copies of the viral nucleoprotein (NP, thus forming ribonucleoproteins (vRNP. Transcription and replication of vRNAs result in viral mRNAs (vmRNAs and complementary RNAs (cRNAs, respectively. Complementary RNAs are the exact positive copies of vRNAs; they also form ribonucleoproteins (cRNPs and are intermediate templates in the vRNA amplification process. On the contrary, vmRNAs have a 5′ cap snatched from cellular mRNAs and a 3′ polyA tail, both gained by the viral polymerase complex. Hence, unlike vRNAs and cRNAs, vmRNAs do not have a terminal promoter able to recruit the viral polymerase. Furthermore, synthesis of at least two viral proteins requires vmRNA splicing. Except for extensive analysis of the viral promoter structure and function and a few, mostly bioinformatics, studies addressing the vRNA and vmRNA structure, structural studies of the influenza A vRNAs, cRNAs, and vmRNAs are still in their infancy. The recent crystal structures of the influenza polymerase heterotrimeric complex drastically improved our understanding of the replication and transcription processes. The vRNA structure has been mainly studied in vitro using RNA probing, but its structure has been very recently studied within native vRNPs using crosslinking and RNA probing coupled to next generation RNA sequencing. Concerning vmRNAs, most studies focused on the segment M and NS splice sites and several structures initially predicted by bioinformatics analysis

  11. Faster exact Markovian probability functions for motif occurrences: a DFA-only approach.

    Science.gov (United States)

    Ribeca, Paolo; Raineri, Emanuele

    2008-12-15

    The computation of the statistical properties of motif occurrences has an obviously relevant application: patterns that are significantly over- or under-represented in genomes or proteins are interesting candidates for biological roles. However, the problem is computationally hard; as a result, virtually all the existing motif finders use fast but approximate scoring functions, in spite of the fact that they have been shown to produce systematically incorrect results. A few interesting exact approaches are known, but they are very slow and hence not practical in the case of realistic sequences. We give an exact solution, solely based on deterministic finite-state automata (DFA), to the problem of finding the whole relevant part of the probability distribution function of a simple-word motif in a homogeneous (biological) sequence. Out of that, the z-value can always be computed, while the P-value can be obtained either when it is not too extreme with respect to the number of floating-point digits available in the implementation, or when the number of pattern occurrences is moderately low. In particular, the time complexity of the algorithms for Markov models of moderate order (0 manage to obtain an algorithm which is both easily interpretable and efficient. This approach can be used for exact statistical studies of very long genomes and protein sequences, as we illustrate with some examples on the scale of the human genome.

  12. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    Directory of Open Access Journals (Sweden)

    Bolton Michael J

    2011-11-01

    Full Text Available Abstract Background The HIV surface glycoprotein gp120 (SU, gp120 and the Plasmodium vivax Duffy binding protein (PvDBP bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM. Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infection of erythrocytes and DBP binding to the Duffy Antigen Receptor for Chemokines (DARC. A peptide including the HBM of PvDBP had similar affinity for heparin as RANTES and V3 loop peptides, and could be specifically inhibited from heparin binding by the same polyanions that inhibit DBP binding to DARC. However, some V3 peptides can competitively inhibit RANTES binding to heparin, but not the PvDBP HBM peptide. Three other members of the DBP family have an HBM sequence that is necessary for erythrocyte binding, however only the protein which binds to DARC, the P. knowlesi alpha protein, is inhibited by heparin from binding to erythrocytes. Heparitinase digestion does not affect the binding of DBP to erythrocytes. Conclusion The HBMs of DBPs that bind to DARC have similar heparin binding affinities as some V3 loop peptides and chemokines, are responsible for specific sulfated polysaccharide inhibition of parasite binding and invasion of red blood cells, and are more likely to bind to negative charges on the receptor than cell surface glycosaminoglycans.

  13. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    Science.gov (United States)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  14. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo; Jankovic, Boris R.; Bajic, Vladimir B.; Song, Le; Gao, Xin

    2013-01-01

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  15. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  16. Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs.

    Science.gov (United States)

    Huo, Tong; Liu, Wei; Guo, Yu; Yang, Cheng; Lin, Jianping; Rao, Zihe

    2015-03-26

    Emergence of multiple drug resistant strains of M. tuberculosis (MDR-TB) threatens to derail global efforts aimed at reigning in the pathogen. Co-infections of M. tuberculosis with HIV are difficult to treat. To counter these new challenges, it is essential to study the interactions between M. tuberculosis and the host to learn how these bacteria cause disease. We report a systematic flow to predict the host pathogen interactions (HPIs) between M. tuberculosis and Homo sapiens based on sequence motifs. First, protein sequences were used as initial input for identifying the HPIs by 'interolog' method. HPIs were further filtered by prediction of domain-domain interactions (DDIs). Functional annotations of protein and publicly available experimental results were applied to filter the remaining HPIs. Using such a strategy, 118 pairs of HPIs were identified, which involve 43 proteins from M. tuberculosis and 48 proteins from Homo sapiens. A biological interaction network between M. tuberculosis and Homo sapiens was then constructed using the predicted inter- and intra-species interactions based on the 118 pairs of HPIs. Finally, a web accessible database named PATH (Protein interactions of M. tuberculosis and Human) was constructed to store these predicted interactions and proteins. This interaction network will facilitate the research on host-pathogen protein-protein interactions, and may throw light on how M. tuberculosis interacts with its host.

  17. Discovering sequence motifs in quantitative and qualitative pepetide data

    DEFF Research Database (Denmark)

    Andreatta, Massimo

    online as a web-server, was applied to various data sets including mixtures of MHC binding data and distinct classes of ligands to SH3 domains. Next, we investigated how string kernels could be used to identify pattern in peptide data, with particular focus on the MHC class I system. We suggest......Proteins are central to virtually all processes within the cell. The vast amount of functions performed by proteins in biological processes is conferred by their ability to bind in a selective and specific manner to other molecules. The nature of these interactions is, in general terms, three......-dimensional, as binding sites normally consist of a pocket or a groove on the protein surface. However, in many cases such interactions contain a linear component and can be more conveniently represented, or approximated, by a protein-peptide interaction. Whereas time-consuming structural studies are necessary in systems...

  18. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

    Science.gov (United States)

    Zhao, Xiaoyan; Sze, Sing-Hoi

    2011-05-01

    One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.

  20. Viroids: from genotype to phenotype just relying on RNA sequence and structural motifs

    Directory of Open Access Journals (Sweden)

    Ricardo eFlores

    2012-06-01

    Full Text Available As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson-Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunvioidae adopt multibranched conformations occasionally stabilized by kissing loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunvioidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures ⎯either global or local ⎯ determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs.

  1. Peptomics, identification of novel cationic Arabidopsis peptides with conserved sequence motifs

    DEFF Research Database (Denmark)

    Olsen, Addie Nina; Mundy, John; Skriver, Karen

    2002-01-01

    Arabidopsis family of 34 genes. The predicted peptides are characterized by a conserved C-terminal sequence motif and additional primary structure conservation in a core region. The majority of these genes had not previously been annotated. A subset of the predicted peptides show high overall sequence...... similarity to Rapid Alkalinization Factor (RALF), a peptide isolated from tobacco. We therefore refer to this peptide family as RALFL for RALF-Like. RT-PCR analysis confirmed that several of the Arabidopsis genes are expressed and that their expression patterns vary. The identification of a large gene family...

  2. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  3. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element.

    Science.gov (United States)

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-07-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5'-NNCCAC-3' and 5'-GCGMGN'N'-3' (M:A or C; N and N' form Watson-Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences.

  4. LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

    Science.gov (United States)

    Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

    2014-02-17

    As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of

  5. Identification of novel conserved functional motifs across most Influenza A viral strains

    Directory of Open Access Journals (Sweden)

    El-Azab Iman

    2011-01-01

    Full Text Available Abstract Background Influenza A virus poses a continuous threat to global public health. Design of novel universal drugs and vaccine requires a careful analysis of different strains of Influenza A viral genome from diverse hosts and subtypes. We performed a systematic in silico analysis of Influenza A viral segments of all available Influenza A viral strains and subtypes and grouped them based on host, subtype, and years isolated, and through multiple sequence alignments we extrapolated conserved regions, motifs, and accessible regions for functional mapping and annotation. Results Across all species and strains 87 highly conserved regions (conservation percentage > = 90% and 19 functional motifs (conservation percentage = 100% were found in PB2, PB1, PA, NP, M, and NS segments. The conservation percentage of these segments ranged between 94 - 98% in human strains (the most conserved, 85 - 93% in swine strains (the most variable, and 91 - 94% in avian strains. The most conserved segment was different in each host (PB1 for human strains, NS for avian strains, and M for swine strains. Target accessibility prediction yielded 324 accessible regions, with a single stranded probability > 0.5, of which 78 coincided with conserved regions. Some of the interesting annotations in these regions included sites for protein-protein interactions, the RNA binding groove, and the proton ion channel. Conclusions The influenza virus has evolved to adapt to its host through variations in the GC content and conservation percentage of the conserved regions. Nineteen universal conserved functional motifs were discovered, of which some were accessible regions with interesting biological functions. These regions will serve as a foundation for universal drug targets as well as universal vaccine design.

  6. Positive evolutionary selection of an HD motif on Alzheimer precursor protein orthologues suggests a functional role.

    Science.gov (United States)

    Miklós, István; Zádori, Zoltán

    2012-02-01

    HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (pHD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the "transcription binding site turnover." CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs.

  7. MicroRNA categorization using sequence motifs and k-mers.

    Science.gov (United States)

    Yousef, Malik; Khalifa, Waleed; Acar, İlhan Erkin; Allmer, Jens

    2017-03-14

    Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.

  8. A tandem sequence motif acts as a distance-dependent enhancer in a set of genes involved in translation by binding the proteins NonO and SFPQ

    Directory of Open Access Journals (Sweden)

    Roepcke Stefan

    2011-12-01

    Full Text Available Abstract Background Bioinformatic analyses of expression control sequences in promoters of co-expressed or functionally related genes enable the discovery of common regulatory sequence motifs that might be involved in co-ordinated gene expression. By studying promoter sequences of the human ribosomal protein genes we recently identified a novel highly specific Localized Tandem Sequence Motif (LTSM. In this work we sought to identify additional genes and LTSM-binding proteins to elucidate potential regulatory mechanisms. Results Genome-wide analyses allowed finding a considerable number of additional LTSM-positive genes, the products of which are involved in translation, among them, translation initiation and elongation factors, and 5S rRNA. Electromobility shift assays then showed specific signals demonstrating the binding of protein complexes to LTSM in ribosomal protein gene promoters. Pull-down assays with LTSM-containing oligonucleotides and subsequent mass spectrometric analysis identified the related multifunctional nucleotide binding proteins NonO and SFPQ in the binding complex. Functional characterization then revealed that LTSM enhances the transcriptional activity of the promoters in dependency of the distance from the transcription start site. Conclusions Our data demonstrate the power of bioinformatic analyses for the identification of biologically relevant sequence motifs. LTSM and the here found LTSM-binding proteins NonO and SFPQ were discovered through a synergistic combination of bioinformatic and biochemical methods and are regulators of the expression of a set of genes of the translational apparatus in a distance-dependent manner.

  9. SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

    Science.gov (United States)

    Ramu, Chenna

    2003-07-01

    SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.

  10. Regulation and function of the CD3¿ DxxxLL motif: a binding site for adaptor protein-1 and adaptor protein-2 in vitro

    DEFF Research Database (Denmark)

    Dietrich, J; Kastrup, J; Nielsen, B L

    1997-01-01

    /CD3gamma chimeras; and in vitro by binding CD3gamma peptides to clathrin-coated vesicle adaptor proteins (APs). We find that the CD3gamma D127xxxLL131/132 sequence represents one united motif for binding of both AP-1 and AP-2, and that this motif functions as an active sorting motif in monomeric CD4...... and for AP binding in vitro. Furthermore, we provide evidence indicating that phosphorylation of CD3gamma S126 in the context of the complete TCR induces a conformational change that exposes the DxxxLL sequence for AP binding. Exposure of the DxxxLL motif causes an increase in the TCR internalization rate...

  11. Novel Structural and Functional Motifs in cellulose synthase (CesA Genes of Bread Wheat (Triticum aestivum, L..

    Directory of Open Access Journals (Sweden)

    Simerjeet Kaur

    Full Text Available Cellulose is the primary determinant of mechanical strength in plant tissues. Late-season lodging is inversely related to the amount of cellulose in a unit length of the stem. Wheat is the most widely grown of all the crops globally, yet information on its CesA gene family is limited. We have identified 22 CesA genes from bread wheat, which include homoeologs from each of the three genomes, and named them as TaCesAXA, TaCesAXB or TaCesAXD, where X denotes the gene number and the last suffix stands for the respective genome. Sequence analyses of the CESA proteins from wheat and their orthologs from barley, maize, rice, and several dicot species (Arabidopsis, beet, cotton, poplar, potato, rose gum and soybean revealed motifs unique to monocots (Poales or dicots. Novel structural motifs CQIC and SVICEXWFA were identified, which distinguished the CESAs involved in the formation of primary and secondary cell wall (PCW and SCW in all the species. We also identified several new motifs specific to monocots or dicots. The conserved motifs identified in this study possibly play functional roles specific to PCW or SCW formation. The new insights from this study advance our knowledge about the structure, function and evolution of the CesA family in plants in general and wheat in particular. This information will be useful in improving culm strength to reduce lodging or alter wall composition to improve biofuel production.

  12. Sequence and structural analysis of the chitinase insertion domain reveals two conserved motifs involved in chitin-binding.

    Directory of Open Access Journals (Sweden)

    Hai Li

    2010-01-01

    Full Text Available Chitinases are prevalent in life and are found in species including archaea, bacteria, fungi, plants, and animals. They break down chitin, which is the second most abundant carbohydrate in nature after cellulose. Hence, they are important for maintaining a balance between carbon and nitrogen trapped as insoluble chitin in biomass. Chitinases are classified into two families, 18 and 19 glycoside hydrolases. In addition to a catalytic domain, which is a triosephosphate isomerase barrel, many family 18 chitinases contain another module, i.e., chitinase insertion domain. While numerous studies focus on the biological role of the catalytic domain in chitinase activity, the function of the chitinase insertion domain is not completely understood. Bioinformatics offers an important avenue in which to facilitate understanding the role of residues within the chitinase insertion domain in chitinase function.Twenty-seven chitinase insertion domain sequences, which include four experimentally determined structures and span five kingdoms, were aligned and analyzed using a modified sequence entropy parameter. Thirty-two positions with conserved residues were identified. The role of these conserved residues was explored by conducting a structural analysis of a number of holo-enzymes. Hydrogen bonding and van der Waals calculations revealed a distinct subset of four conserved residues constituting two sequence motifs that interact with oligosaccharides. The other conserved residues may be key to the structure, folding, and stability of this domain.Sequence and structural studies of the chitinase insertion domains conducted within the framework of evolution identified four conserved residues which clearly interact with the substrates. Furthermore, evolutionary studies propose a link between the appearance of the chitinase insertion domain and the function of family 18 chitinases in the subfamily A.

  13. Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins.

    Directory of Open Access Journals (Sweden)

    David Karlin

    Full Text Available Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa, several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains that could be detected simply by comparing orthologous proteins.

  14. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active...... related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein...... sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally...

  15. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Science.gov (United States)

    Grimm, Guido W.; Renner, Susanne S.; Stamatakis, Alexandros; Hemleben, Vera

    2007-01-01

    The multi-copy internal transcribed spacer (ITS) region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML) and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation) instead of the full (partly redundant) original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994) 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly. PMID:19455198

  16. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Directory of Open Access Journals (Sweden)

    Guido W. Grimm

    2006-01-01

    Full Text Available The multi-copy internal transcribed spacer (ITS region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation instead of the full (partly redundant original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

  17. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs

    DEFF Research Database (Denmark)

    van Beest, M; Dooijes, D; van De Wetering, M

    2000-01-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment...

  18. Analysis of alkaptonuria (AKU) mutations and polymorphisms reveals that the CCC sequence motif is a mutational hot spot in the homogentisate 1,2 dioxygenase gene (HGO).

    Science.gov (United States)

    Beltrán-Valero de Bernabé, D; Jimenez, F J; Aquaron, R; Rodríguez de Córdoba, S

    1999-01-01

    We recently showed that alkaptonuria (AKU) is caused by loss-of-function mutations in the homogentisate 1,2 dioxygenase gene (HGO). Herein we describe haplotype and mutational analyses of HGO in seven new AKU pedigrees. These analyses identified two novel single-nucleotide polymorphisms (INV4+31A-->G and INV11+18A-->G) and six novel AKU mutations (INV1-1G-->A, W60G, Y62C, A122D, P230T, and D291E), which further illustrates the remarkable allelic heterogeneity found in AKU. Reexamination of all 29 mutations and polymorphisms thus far described in HGO shows that these nucleotide changes are not randomly distributed; the CCC sequence motif and its inverted complement, GGG, are preferentially mutated. These analyses also demonstrated that the nucleotide substitutions in HGO do not involve CpG dinucleotides, which illustrates important differences between HGO and other genes for the occurrence of mutation at specific short-sequence motifs. Because the CCC sequence motifs comprise a significant proportion (34.5%) of all mutated bases that have been observed in HGO, we conclude that the CCC triplet is a mutational hot spot in HGO. PMID:10205262

  19. Markovian Model in High Order Sequence Prediction From Log-Motif Patterns in Agbada Paralic Section, Niger Delta, Nigeria

    International Nuclear Information System (INIS)

    Olabode, S. O.; Adekoya, J. A.

    2002-01-01

    Markovian model in the elucidation of high order sequence was applied to repetitive events of regressive and transgressive phases in the Agbada paralic section Niger Delta. The repetitive events are made up of delta front, delta topset and fluvio-deltaic sediments. The sediments consist of sands, sandstones, siltstones and shales in various proportions. Five wells: MN1, AA1, NP2, NP6 and NP8 were studied.Summary of biostratigraphic report and well log-motif patterns was used to delineate the third order depositional sequences in the wells.Various Markovian properties - observed transition frequency matrix, observed transition probability matrix, fixed probability vector, expected random matrix (randomised transition matrix) and difference matrix were determined for stacked high order sequence (high frequency cyclic events) nested within the third-order sequences using the log-motif patterns for the various sand bodies and shales. Flow diagrams were constructed for each of the depositional sequences to know the likely occurrence of number of cycles.Upward transition matrix between the log-motif patterns and flow diagram to elucidate cyclicity show that the overall regressive sequence of the Niger Delta has been modified by deltaic depositional elements and fluctuations in sea level. The predictions of higher order sequence within third order sequences from Markovian Properties provide good basis for correlation within the depositional sequences. The model has also been used to decipher the dominant depositional processes during the formation of the sequences. Discrete reservoir intervals and seal potentials within the sequences were also predicted from the flow diagrams constructed

  20. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  1. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  2. Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

    Directory of Open Access Journals (Sweden)

    William R. Gallaher

    2015-01-01

    Full Text Available Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP and the full length glycoprotein (GP, which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4 of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis.

  3. Flow Cytometry-Assisted Cloning of Specific Sequence Motifs from Complex 16S rRNA Gene Libraries

    DEFF Research Database (Denmark)

    Nielsen, Jeppe Lund; Schramm, Andreas; Bernhard, Anne E.

    2004-01-01

    for Systems Biology,3 Seattle, Washington, and Department of Ecological Microbiology, University of Bayreuth, Bayreuth, Germany2 A flow cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting......  FLOW CYTOMETRY-ASSISTED CLONING OF SPECIFIC SEQUENCE MOTIFS FROM COMPLEX 16S RRNA GENE LIBRARIES Jeppe L. Nielsen,1 Andreas Schramm,1,2 Anne E. Bernhard,1 Gerrit J. van den Engh,3 and David A. Stahl1* Department of Civil and Environmental Engineering, University of Washington,1 and Institute......-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant in a clone library of environmental 16S rRNA genes.  ...

  4. Do motifs reflect evolved function?--No convergent evolution of genetic regulatory network subgraph topologies.

    Science.gov (United States)

    Knabe, Johannes F; Nehaniv, Chrystopher L; Schilstra, Maria J

    2008-01-01

    Methods that analyse the topological structure of networks have recently become quite popular. Whether motifs (subgraph patterns that occur more often than in randomized networks) have specific functions as elementary computational circuits has been cause for debate. As the question is difficult to resolve with currently available biological data, we approach the issue using networks that abstractly model natural genetic regulatory networks (GRNs) which are evolved to show dynamical behaviors. Specifically one group of networks was evolved to be capable of exhibiting two different behaviors ("differentiation") in contrast to a group with a single target behavior. In both groups we find motif distribution differences within the groups to be larger than differences between them, indicating that evolutionary niches (target functions) do not necessarily mold network structure uniquely. These results show that variability operators can have a stronger influence on network topologies than selection pressures, especially when many topologies can create similar dynamics. Moreover, analysis of motif functional relevance by lesioning did not suggest that motifs were of greater importance to the functioning of the network than arbitrary subgraph patterns. Only when drastically restricting network size, so that one motif corresponds to a whole functionally evolved network, was preference for particular connection patterns found. This suggests that in non-restricted, bigger networks, entanglement with the rest of the network hinders topological subgraph analysis.

  5. Identification of putative regulatory motifs in the upstream regions of co-expressed functional groups of genes in Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    Joshi NV

    2009-01-01

    Full Text Available Abstract Background Regulation of gene expression in Plasmodium falciparum (Pf remains poorly understood. While over half the genes are estimated to be regulated at the transcriptional level, few regulatory motifs and transcription regulators have been found. Results The study seeks to identify putative regulatory motifs in the upstream regions of 13 functional groups of genes expressed in the intraerythrocytic developmental cycle of Pf. Three motif-discovery programs were used for the purpose, and motifs were searched for only on the gene coding strand. Four motifs – the 'G-rich', the 'C-rich', the 'TGTG' and the 'CACA' motifs – were identified, and zero to all four of these occur in the 13 sets of upstream regions. The 'CACA motif' was absent in functional groups expressed during the ring to early trophozoite transition. For functional groups expressed in each transition, the motifs tended to be similar. Upstream motifs in some functional groups showed 'positional conservation' by occurring at similar positions relative to the translational start site (TLS; this increases their significance as regulatory motifs. In the ribonucleotide synthesis, mitochondrial, proteasome and organellar translation machinery genes, G-rich, C-rich, CACA and TGTG motifs, respectively, occur with striking positional conservation. In the organellar translation machinery group, G-rich motifs occur close to the TLS. The same motifs were sometimes identified for multiple functional groups; differences in location and abundance of the motifs appear to ensure different modes of action. Conclusion The identification of positionally conserved over-represented upstream motifs throws light on putative regulatory elements for transcription in Pf.

  6. Efficient farnesylation of an extended C-terminal C(x)3X sequence motif expands the scope of the prenylated proteome.

    Science.gov (United States)

    Blanden, Melanie J; Suazo, Kiall F; Hildebrandt, Emily R; Hardgrove, Daniel S; Patel, Meet; Saunders, William P; Distefano, Mark D; Schmidt, Walter K; Hougland, James L

    2018-02-23

    Protein prenylation is a post-translational modification that has been most commonly associated with enabling protein trafficking to and interaction with cellular membranes. In this process, an isoprenoid group is attached to a cysteine near the C terminus of a substrate protein by protein farnesyltransferase (FTase) or protein geranylgeranyltransferase type I or II (GGTase-I and GGTase-II). FTase and GGTase-I have long been proposed to specifically recognize a four-amino acid C AAX C-terminal sequence within their substrates. Surprisingly, genetic screening reveals that yeast FTase can modify sequences longer than the canonical C AAX sequence, specifically C( x ) 3 X sequences with four amino acids downstream of the cysteine. Biochemical and cell-based studies using both peptide and protein substrates reveal that mammalian FTase orthologs can also prenylate C( x ) 3 X sequences. As the search to identify physiologically relevant C( x ) 3 X proteins begins, this new prenylation motif nearly doubles the number of proteins within the yeast and human proteomes that can be explored as potential FTase substrates. This work expands our understanding of prenylation's impact within the proteome, establishes the biologically relevant reactivity possible with this new motif, and opens new frontiers in determining the impact of non-canonically prenylated proteins on cell function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  7. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Science.gov (United States)

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  8. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Directory of Open Access Journals (Sweden)

    Zing Tsung-Yeh Tsai

    2015-08-01

    Full Text Available Transcription factor (TF binding is determined by the presence of specific sequence motifs (SM and chromatin accessibility, where the latter is influenced by both chromatin state (CS and DNA structure (DS properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  9. One motif to bind them: A small-XXX-small motif affects transmembrane domain 1 oligomerization, function, localization, and cross-talk between two yeast GPCRs.

    Science.gov (United States)

    Lock, Antonia; Forfar, Rachel; Weston, Cathryn; Bowsher, Leo; Upton, Graham J G; Reynolds, Christopher A; Ladds, Graham; Dixon, Ann M

    2014-12-01

    G protein-coupled receptors (GPCRs) are the largest family of cell-surface receptors in mammals and facilitate a range of physiological responses triggered by a variety of ligands. GPCRs were thought to function as monomers, however it is now accepted that GPCR homo- and hetero-oligomers also exist and influence receptor properties. The Schizosaccharomyces pombe GPCR Mam2 is a pheromone-sensing receptor involved in mating and has previously been shown to form oligomers in vivo. The first transmembrane domain (TMD) of Mam2 contains a small-XXX-small motif, overrepresented in membrane proteins and well-known for promoting helix-helix interactions. An ortholog of Mam2 in Saccharomyces cerevisiae, Ste2, contains an analogous small-XXX-small motif which has been shown to contribute to receptor homo-oligomerization, localization and function. Here we have used experimental and computational techniques to characterize the role of the small-XXX-small motif in function and assembly of Mam2 for the first time. We find that disruption of the motif via mutagenesis leads to reduction of Mam2 TMD1 homo-oligomerization and pheromone-responsive cellular signaling of the full-length protein. It also impairs correct targeting to the plasma membrane. Mutation of the analogous motif in Ste2 yielded similar results, suggesting a conserved mechanism for assembly. Using co-expression of the two fungal receptors in conjunction with computational models, we demonstrate a functional change in G protein specificity and propose that this is brought about through hetero-dimeric interactions of Mam2 with Ste2 via the complementary small-XXX-small motifs. This highlights the potential of these motifs to affect a range of properties that can be investigated in other GPCRs. Copyright © 2014. Published by Elsevier B.V.

  10. Conserved Functional Motifs and Homology Modeling to Predict Hidden Moonlighting Functional Sites

    KAUST Repository

    Wong, Aloysius Tze; Gehring, Christoph A; Irving, Helen R.

    2015-01-01

    Moonlighting functional centers within proteins can provide them with hitherto unrecognized functions. Here, we review how hidden moonlighting functional centers, which we define as binding sites that have catalytic activity or regulate protein function in a novel manner, can be identified using targeted bioinformatic searches. Functional motifs used in such searches include amino acid residues that are conserved across species and many of which have been assigned functional roles based on experimental evidence. Molecules that were identified in this manner seeking cyclic mononucleotide cyclases in plants are used as examples. The strength of this computational approach is enhanced when good homology models can be developed to test the functionality of the predicted centers in silico, which, in turn, increases confidence in the ability of the identified candidates to perform the predicted functions. Computational characterization of moonlighting functional centers is not diagnostic for catalysis but serves as a rapid screening method, and highlights testable targets from a potentially large pool of candidates for subsequent in vitro and in vivo experiments required to confirm the functionality of the predicted moonlighting centers.

  11. Conserved Functional Motifs and Homology Modeling to Predict Hidden Moonlighting Functional Sites

    KAUST Repository

    Wong, Aloysius Tze

    2015-06-09

    Moonlighting functional centers within proteins can provide them with hitherto unrecognized functions. Here, we review how hidden moonlighting functional centers, which we define as binding sites that have catalytic activity or regulate protein function in a novel manner, can be identified using targeted bioinformatic searches. Functional motifs used in such searches include amino acid residues that are conserved across species and many of which have been assigned functional roles based on experimental evidence. Molecules that were identified in this manner seeking cyclic mononucleotide cyclases in plants are used as examples. The strength of this computational approach is enhanced when good homology models can be developed to test the functionality of the predicted centers in silico, which, in turn, increases confidence in the ability of the identified candidates to perform the predicted functions. Computational characterization of moonlighting functional centers is not diagnostic for catalysis but serves as a rapid screening method, and highlights testable targets from a potentially large pool of candidates for subsequent in vitro and in vivo experiments required to confirm the functionality of the predicted moonlighting centers.

  12. Sequence-specific DNA binding activity of the cross-brace zinc finger motif of the piggyBac transposase

    Science.gov (United States)

    Morellet, Nelly; Li, Xianghong; Wieninger, Silke A; Taylor, Jennifer L; Bischerour, Julien; Moriau, Séverine; Lescop, Ewen; Bardiaux, Benjamin; Mathy, Nathalie; Assrir, Nadine; Bétermier, Mireille; Nilges, Michael; Hickman, Alison B; Dyda, Fred; Craig, Nancy L; Guittet, Eric

    2018-01-01

    Abstract The piggyBac transposase (PB) is distinguished by its activity and utility in genome engineering, especially in humans where it has highly promising therapeutic potential. Little is known, however, about the structure–function relationships of the different domains of PB. Here, we demonstrate in vitro and in vivo that its C-terminal Cysteine-Rich Domain (CRD) is essential for DNA breakage, joining and transposition and that it binds to specific DNA sequences in the left and right transposon ends, and to an additional unexpectedly internal site at the left end. Using NMR, we show that the CRD adopts the specific fold of the cross-brace zinc finger protein family. We determine the interaction interfaces between the CRD and its target, the 5′-TGCGT-3′/3′-ACGCA-5′ motifs found in the left, left internal and right transposon ends, and use NMR results to propose docking models for the complex, which are consistent with our site-directed mutagenesis data. Our results provide support for a model of the PB/DNA interactions in the context of the transpososome, which will be useful for the rational design of PB mutants with increased activity. PMID:29385532

  13. Amino acid sequence motifs essential for P0-mediated suppression of RNA silencing in an isolate of potato leafroll virus from Inner Mongolia.

    Science.gov (United States)

    Zhuo, Tao; Li, Yuan-Yuan; Xiang, Hai-Ying; Wu, Zhan-Yu; Wang, Xian-Bin; Wang, Ying; Zhang, Yong-Liang; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2014-06-01

    Polerovirus P0 suppressors of host gene silencing contain a consensus F-box-like motif with Leu/Pro (L/P) requirements for suppressor activity. The Inner Mongolian Potato leafroll virus (PLRV) P0 protein (P0(PL-IM)) has an unusual F-box-like motif that contains a Trp/Gly (W/G) sequence and an additional GW/WG-like motif (G139/W140/G141) that is lacking in other P0 proteins. We used Agrobacterium infiltration-mediated RNA silencing assays to establish that P0(PL-IM) has a strong suppressor activity. Mutagenesis experiments demonstrated that the P0(PL-IM) F-box-like motif encompasses amino acids 76-LPRHLHYECLEWGLLCG THP-95, and that the suppressor activity is abolished by L76A, W87A, or G88A substitution. The suppressor activity is also weakened substantially by mutations within the G139/W140/G141 region and is eliminated by a mutation (F220R) in a C-terminal conserved sequence of P0(PL-IM). As has been observed with other P0 proteins, P0(PL-IM) suppression is correlated with reduced accumulation of the host AGO1-silencing complex protein. However, P0(PL-IM) fails to bind SKP1, which functions in a proteasome pathway that may be involved in AGO1 degradation. These results suggest that P0(PL-IM) may suppress RNA silencing by using an alternative pathway to target AGO1 for degradation. Our results help improve our understanding of the molecular mechanisms involved in PLRV infection.

  14. Two sequence motifs from HIF-1α bind to the DNA-binding site of p53

    OpenAIRE

    Hansson, Lars O.; Friedler, Assaf; Freund, Stefan; Rüdiger, Stefan; Fersht, Alan R.

    2002-01-01

    There is evidence that hypoxia-inducible factor-1α (HIF-1α) interacts with the tumor suppressor p53. To characterize the putative interaction, we mapped the binding of the core domain of p53 (p53c) to an array of immobilized HIF-1α-derived peptides and found two peptide-sequence motifs that bound to p53c with micromolar affinity in solution. One sequence was adjacent to and the other coincided with the two proline residues of the oxygen-dependent degradation domain (P402 and P564) that act as...

  15. Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

    Science.gov (United States)

    Kinjo, Akira R; Nakamura, Haruki

    2013-01-01

    Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.

  16. MSDmotif: exploring protein sites and motifs

    Directory of Open Access Journals (Sweden)

    Henrick Kim

    2008-07-01

    Full Text Available Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.

  17. Structural analysis of a repetitive protein sequence motif in strepsirrhine primate amelogenin.

    Directory of Open Access Journals (Sweden)

    Rodrigo S Lacruz

    2011-03-01

    Full Text Available Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL, the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates.

  18. MPN+, a putative catalytic motif found in a subset of MPN domain proteins from eukaryotes and prokaryotes, is critical for Rpn11 function

    Directory of Open Access Journals (Sweden)

    Hofmann Kay

    2002-09-01

    Full Text Available Abstract Background Three macromolecular assemblages, the lid complex of the proteasome, the COP9-Signalosome (CSN and the eIF3 complex, all consist of multiple proteins harboring MPN and PCI domains. Up to now, no specific function for any of these proteins has been defined, nor has the importance of these motifs been elucidated. In particular Rpn11, a lid subunit, serves as the paradigm for MPN-containing proteins as it is highly conserved and important for proteasome function. Results We have identified a sequence motif, termed the MPN+ motif, which is highly conserved in a subset of MPN domain proteins such as Rpn11 and Csn5/Jab1, but is not present outside of this subfamily. The MPN+ motif consists of five polar residues that resemble the active site residues of hydrolytic enzyme classes, particularly that of metalloproteases. By using site-directed mutagenesis, we show that the MPN+ residues are important for the function of Rpn11, while a highly conserved Cys residue outside of the MPN+ motif is not essential. Single amino acid substitutions in MPN+ residues all show similar phenotypes, including slow growth, sensitivity to temperature and amino acid analogs, and general proteasome-dependent proteolysis defects. Conclusions The MPN+ motif is abundant in certain MPN-domain proteins, including newly identified proteins of eukaryotes, bacteria and archaea thought to act outside of the traditional large PCI/MPN complexes. The putative catalytic nature of the MPN+ motif makes it a good candidate for a pivotal enzymatic function, possibly a proteasome-associated deubiquitinating activity and a CSN-associated Nedd8/Rub1-removing activity.

  19. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2007-02-01

    Full Text Available Abstract Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

  20. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.; Rangkuti, Farania; Schramm, Michael C.; Jankovic, Boris R.; Kamau, Allan; Chowdhary, Rajesh; Archer, John A.C.; Bajic, Vladimir B.

    2011-01-01

    . These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity

  1. Defining a conformational consensus motif in cotransin-sensitive signal sequences: a proteomic and site-directed mutagenesis study.

    Directory of Open Access Journals (Sweden)

    Wolfgang Klein

    Full Text Available The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity.

  2. Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

    Science.gov (United States)

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945

  3. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  4. De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

    DEFF Research Database (Denmark)

    Ruzzo, Walter L; Gorodkin, Jan

    2014-01-01

    De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphas...... on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.......De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis...

  5. Requirement for asparagine in the aquaporin NPA sequence signature motifs for cation exclusion

    DEFF Research Database (Denmark)

    Wree, Dorothea; Wu, Binghua; Zeuthen, Thomas

    2011-01-01

    Two highly conserved NPA motifs are a hallmark of the aquaporin (AQP) family. The NPA triplets form N-terminal helix capping structures with the Asn side chains located in the centre of the water or solute-conducting channel, and are considered to play an important role in AQP selectivity. Although...... interchangeable at both NPA sites without affecting protein expression or water, glycerol and methylamine permeability. However, other mutations in the NPA region led to reduced permeability (S186C and S186D), to nonfunctional channels (N64D), or even to lack of protein expression (S186A and S186T). Using...... electrophysiology, we found that an analogous mammalian AQP1 N76S mutant excluded protons and potassium ions, but leaked sodium ions, providing an argument for the overwhelming prevalence of Asn over other amino acids. We conclude that, at the first position in the NPA motifs, only Asn provides efficient helix cap...

  6. Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs.

    Directory of Open Access Journals (Sweden)

    Michael Allevato

    Full Text Available The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX bind Enhancer box (E-box DNA elements (CANNTG and have the greatest affinity for the canonical MYC E-box (CME CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a "non-specific" fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87% of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought.

  7. F-Type Lectins: A Highly Diversified Family of Fucose-Binding Proteins with a Unique Sequence Motif and Structural Fold, Involved in Self/Non-Self-Recognition

    Directory of Open Access Journals (Sweden)

    Gerardo R. Vasta

    2017-11-01

    Full Text Available The F-type lectin (FTL family is one of the most recent to be identified and structurally characterized. Members of the FTL family are characterized by a fucose recognition domain [F-type lectin domain (FTLD] that displays a novel jellyroll fold (“F-type” fold and unique carbohydrate- and calcium-binding sequence motifs. This novel lectin family comprises widely distributed proteins exhibiting single, double, or greater multiples of the FTLD, either tandemly arrayed or combined with other structurally and functionally distinct domains, yielding lectin subunits of pleiotropic properties even within a single species. Furthermore, the extraordinary variability of FTL sequences (isoforms that are expressed in a single individual has revealed genetic mechanisms of diversification in ligand recognition that are unique to FTLs. Functions of FTLs in self/non-self-recognition include innate immunity, fertilization, microbial adhesion, and pathogenesis, among others. In addition, although the F-type fold is distinctive for FTLs, a structure-based search revealed apparently unrelated proteins with minor sequence similarity to FTLs that displayed the FTLD fold. In general, the phylogenetic analysis of FTLD sequences from viruses to mammals reveals clades that are consistent with the currently accepted taxonomy of extant species. However, the surprisingly discontinuous distribution of FTLDs within each taxonomic category suggests not only an extensive structural/functional diversification of the FTLs along evolutionary lineages but also that this intriguing lectin family has been subject to frequent gene duplication, secondary loss, lateral transfer, and functional co-option.

  8. Spectrometric study of the folding process of i-motif-forming DNA sequences upstream of the c-kit transcription initiation site

    International Nuclear Information System (INIS)

    Bucek, Pavel; Gargallo, Raimundo; Kudrev, Andrei

    2010-01-01

    The c-kit oncogene shows a cytosine-rich DNA region upstream of the transcription initiation site which forms an i-motif structure at slightly acidic pH values (Bucek et al. ). In the present study, the pH-induced formation of i-motif - forming sequences 5'-CCC CTC CCT CGC GCC CGC CCG-3' (ckitC1, native), 5'-CCC TTC CCT TGT GCC CGC CCG-3' (ckitC2) and 5'-CCCTT CCC TTTTT CCC T CCC T-3' (ckitC3) was studied by spectroscopic techniques, such as UV molecular absorption and circular dichroism (CD), in tandem with two multivariate data analysis methods, the hard modelling-based matrix method and the soft modelling-based MCR-ALS approach. Use of the hard chemical modelling enabled us to propose the equilibrium model, which describes spectral changes as functions of solution acidity. Additionally, the intrinsic protonation constant, K in , and the cooperativity parameters, ω c , and ω a , were calculated from the fitting procedure of the coupled CD and molecular absorption spectra. In the case of ckitC2 and ckitC3, the hard model correctly reproduced the spectral variations observed experimentally. The results indicated that folding was accompanied by a cooperative process, i.e. the enhancement of protonated structure stability upon protonation. In contrast, unfolding was accompanied by an anticooperative process. Finally, folding of the native sequence, ckitC1, seemed to follow a more complex mechanism.

  9. Double-hydrophobic elastin-like polypeptides with added functional motifs: Self-assembly and cytocompatibility.

    Science.gov (United States)

    Le, Duc H T; Tsutsui, Yoko; Sugawara-Narutaki, Ayae; Yukawa, Hiroshi; Baba, Yoshinobu; Ohtsuki, Chikara

    2017-09-01

    We have recently developed a novel double-hydrophobic elastin-like triblock polypeptide called GPG, designed after the uneven distribution of two different hydrophobic domains found in elastin, an extracellular matrix protein providing elasticity and resilience to tissues. Upon temperature trigger, GPG undergoes a sequential self-assembling process to form flexible beaded nanofibers with high homogeneity and excellent dispersibility in water. Given that GPG might be a potential elastin-mimetic material, we sought to explore the biological activities of this block polypeptide. Besides GPG, several functionalized derivatives were also constructed by fusing functional motifs such as KAAK or KAAKGRGDS at the C-terminal of GPG. Although the added motifs affected the kinetics of fiber formation and β-sheet contents, all three GPGs assembled into beaded nanofibers at the physiological temperature. The resulting GPG nanofibers preserved their beaded structures in cell culture medium; therefore, they were coated on polystyrene substrates to study their cytocompatibility toward mouse embryonic fibroblasts, NIH-3T3. Among the three polypeptides, GPG having the cell-binding motif GRGDS derived from fibronectin showed excellent cell adhesion and cell proliferation properties compared to other conventional materials, suggesting its promising applications as extracellular matrices for mammalian cells. © 2017 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 105A: 2475-2484, 2017. © 2017 Wiley Periodicals, Inc.

  10. Determination of 5 '-leader sequences from radically disparate strains of porcine reproductive and respiratory syndrome virus reveals the presence of highly conserved sequence motifs

    DEFF Research Database (Denmark)

    Oleksiewicz, M.B.; Bøtner, Anette; Nielsen, Jens

    1999-01-01

    We determined the untranslated 5'-leader sequence for three different isolates of porcine reproductive and respiratory syndrome virus (PRRSV): pathogenic European- and American-types, as well as an American-type vaccine strain. 5'-leader from European- and American-type PRRSV differed in length...... (220 and 190 nt, respectively), and exhibited only approximately 50% nucleotide homology. Nevertheless, highly conserved areas were identified in the leader of all 3 PRRSV isolates, which constitute candidate motifs for binding of protein(s) involved in viral replication. These comparative data provide...

  11. The conservation pattern of short linear motifs is highly correlated with the function of interacting protein domains

    Directory of Open Access Journals (Sweden)

    Wang Yiguo

    2008-10-01

    Full Text Available Abstract Background Many well-represented domains recognize primary sequences usually less than 10 amino acids in length, called Short Linear Motifs (SLiMs. Accurate prediction of SLiMs has been difficult because they are short (often Results Our combined approach revealed that SLiMs are highly conserved in proteins from functional classes that are known to interact with a specific domain, but that they are not conserved in most other protein groups. We found that SLiMs recognized by SH2 domains were highly conserved in receptor kinases/phosphatases, adaptor molecules, and tyrosine kinases/phosphatases, that SLiMs recognized by SH3 domains were highly conserved in cytoskeletal and cytoskeletal-associated proteins, that SLiMs recognized by PDZ domains were highly conserved in membrane proteins such as channels and receptors, and that SLiMs recognized by S/T kinase domains were highly conserved in adaptor molecules, S/T kinases/phosphatases, and proteins involved in transcription or cell cycle control. We studied Tyr-SLiMs recognized by SH2 domains in more detail, and found that SH2-recognized Tyr-SLiMs on the cytoplasmic side of membrane proteins are more highly conserved than those on the extra-cellular side. Also, we found that SH2-recognized Tyr-SLiMs that are associated with SH3 motifs and a tyrosine kinase phosphorylation motif are more highly conserved. Conclusion The interactome of protein domains is reflected by the evolutionary conservation of SLiMs recognized by these domains. Combining scoring matrixes derived from peptide libraries and conservation analysis, we would be able to find those protein groups that are more likely to interact with specific domains.

  12. MicroRNA sequence motifs reveal asymmetry between the stem arms

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Havgaard, Jakob Hull; Ensterö, M.

    2006-01-01

    The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature miRNAs in their gen......The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature mi...

  13. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  14. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  15. Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview.

    Science.gov (United States)

    Karvelis, Tautvydas; Gasiunas, Giedrius; Siksnys, Virginijus

    2017-05-15

    Recently the Cas9, an RNA guided DNA endonuclease, emerged as a powerful tool for targeted genome manipulations. Cas9 protein can be reprogrammed to cleave, bind or nick any DNA target by simply changing crRNA sequence, however a short nucleotide sequence, termed PAM, is required to initiate crRNA hybridization to the DNA target. PAM sequence is recognized by Cas9 protein and must be determined experimentally for each Cas9 variant. Exploration of Cas9 orthologs could offer a diversity of PAM sequences and novel biochemical properties that may be beneficial for genome editing applications. Here we briefly review and compare Cas9 PAM identification assays that can be adopted for other PAM-dependent CRISPR-Cas systems. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. A conserved cysteine motif is critical for rice ceramide kinase activity and function.

    Directory of Open Access Journals (Sweden)

    Fang-Cheng Bi

    Full Text Available Ceramide kinase (CERK is a key regulator of cell survival in dicotyledonous plants and animals. Much less is known about the roles of CERK and ceramides in mediating cellular processes in monocot plants. Here, we report the characterization of a ceramide kinase, OsCERK, from rice (Oryza sativa spp. Japonica cv. Nipponbare and investigate the effects of ceramides on rice cell viability.OsCERK can complement the Arabidopsis CERK mutant acd5. Recombinant OsCERK has ceramide kinase activity with Michaelis-Menten kinetics and optimal activity at 7.0 pH and 40°C. Mg2+ activates OsCERK in a concentration-dependent manner. Importantly, a CXXXCXXC motif, conserved in all ceramide kinases and important for the activity of the human enzyme, is critical for OsCERK enzyme activity and in planta function. In a rice protoplast system, inhibition of CERK leads to cell death and the ratio of added ceramide and ceramide-1-phosphate, CERK's substrate and product, respectively, influences cell survival. Ceramide-induced rice cell death has apoptotic features and is an active process that requires both de novo protein synthesis and phosphorylation, respectively. Finally, mitochondria membrane potential loss previously associated with ceramide-induced cell death in Arabidopsis was also found in rice, but it occurred with different timing.OsCERK is a bona fide ceramide kinase with a functionally and evolutionarily conserved Cys-rich motif that plays an important role in modulating cell fate in plants. The vital function of the conserved motif in both human and rice CERKs suggests that the biochemical mechanism of CERKs is similar in animals and plants. Furthermore, ceramides induce cell death with similar features in monocot and dicot plants.

  17. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

    Science.gov (United States)

    Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

    2012-01-01

    To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.

  18. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

    Science.gov (United States)

    Gilbert, N; Labuda, D

    1999-03-16

    A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.

  19. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    Science.gov (United States)

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  20. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-01

    LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  1. Identification of E-cadherin signature motifs functioning as cleavage sites for Helicobacter pylori HtrA

    Science.gov (United States)

    Schmidt, Thomas P.; Perna, Anna M.; Fugmann, Tim; Böhm, Manja; Jan Hiss; Haller, Sarah; Götz, Camilla; Tegtmeyer, Nicole; Hoy, Benjamin; Rau, Tilman T.; Neri, Dario; Backert, Steffen; Schneider, Gisbert; Wessler, Silja

    2016-03-01

    The cell adhesion protein and tumour suppressor E-cadherin exhibits important functions in the prevention of gastric cancer. As a class-I carcinogen, Helicobacter pylori (H. pylori) has developed a unique strategy to interfere with E-cadherin functions. In previous studies, we have demonstrated that H. pylori secretes the protease high temperature requirement A (HtrA) which cleaves off the E-cadherin ectodomain (NTF) on epithelial cells. This opens cell-to-cell junctions, allowing bacterial transmigration across the polarised epithelium. Here, we investigated the molecular mechanism of the HtrA-E-cadherin interaction and identified E-cadherin cleavage sites for HtrA. Mass-spectrometry-based proteomics and Edman degradation revealed three signature motifs containing the [VITA]-[VITA]-x-x-D-[DN] sequence pattern, which were preferentially cleaved by HtrA. Based on these sites, we developed a substrate-derived peptide inhibitor that selectively bound and inhibited HtrA, thereby blocking transmigration of H. pylori. The discovery of HtrA-targeted signature sites might further explain why we detected a stable 90 kDa NTF fragment during H. pylori infection, but also additional E-cadherin fragments ranging from 105 kDa to 48 kDa in in vitro cleavage experiments. In conclusion, HtrA targets E-cadherin signature sites that are accessible in in vitro reactions, but might be partially masked on epithelial cells through functional homophilic E-cadherin interactions.

  2. Salt-bridging effects on short amphiphilic helical structure and introducing sequence-based short beta-turn motifs.

    Science.gov (United States)

    Guarracino, Danielle A; Gentile, Kayla; Grossman, Alec; Li, Evan; Refai, Nader; Mohnot, Joy; King, Daniel

    2018-02-01

    Determining the minimal sequence necessary to induce protein folding is beneficial in understanding the role of protein-protein interactions in biological systems, as their three-dimensional structures often dictate their activity. Proteins are generally comprised of discrete secondary structures, from α-helices to β-turns and larger β-sheets, each of which is influenced by its primary structure. Manipulating the sequence of short, moderately helical peptides can help elucidate the influences on folding. We created two new scaffolds based on a modestly helical eight-residue peptide, PT3, we previously published. Using circular dichroism (CD) spectroscopy and changing the possible salt-bridging residues to new combinations of Lys, Arg, Glu, and Asp, we found that our most helical improvements came from the Arg-Glu combination, whereas the Lys-Asp was not significantly different from the Lys-Glu of the parent scaffold, PT3. The marked 3 10 -helical contributions in PT3 were lessened in the Arg-Glu-containing peptide with the beginning of cooperative unfolding seen through a thermal denaturation. However, a unique and unexpected signature was seen for the denaturation of the Lys-Asp peptide which could help elucidate the stages of folding between the 3 10 and α-helix. In addition, we developed a short six-residue peptide with β-turn/sheet CD signature, again to help study minimal sequences needed for folding. Overall, the results indicate that improvements made to short peptide scaffolds by fine-tuning the salt-bridging residues can enhance scaffold structure. Likewise, with the results from the new, short β-turn motif, these can help impact future peptidomimetic designs in creating biologically useful, short, structured β-sheet-forming peptides.

  3. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  4. Mutations in the catalytic loop HRD motif alter the activity and function of Drosophila Src64.

    Directory of Open Access Journals (Sweden)

    Taylor C Strong

    Full Text Available The catalytic loop HRD motif is found in most protein kinases and these amino acids are predicted to perform functions in catalysis, transition to, and stabilization of the active conformation of the kinase domain. We have identified mutations in a Drosophila src gene, src64, that alter the three HRD amino acids. We have analyzed the mutants for both biochemical activity and biological function during development. Mutation of the aspartate to asparagine eliminates biological function in cytoskeletal processes and severely reduces fertility, supporting the amino acid's critical role in enzymatic activity. The arginine to cysteine mutation has little to no effect on kinase activity or cytoskeletal reorganization, suggesting that the HRD arginine may not be critical for coordinating phosphotyrosine in the active conformation. The histidine to leucine mutant retains some kinase activity and biological function, suggesting that this amino acid may have a biochemical function in the active kinase that is independent of its side chain hydrogen bonding interactions in the active site. We also describe the phenotypic effects of other mutations in the SH2 and tyrosine kinase domains of src64, and we compare them to the phenotypic effects of the src64 null allele.

  5. New bioactive motifs and their use in functionalized self-assembling peptides for NSC differentiation and neural tissue engineering

    Science.gov (United States)

    Gelain, F.; Cigognini, D.; Caprini, A.; Silva, D.; Colleoni, B.; Donegá, M.; Antonini, S.; Cohen, B. E.; Vescovi, A.

    2012-04-01

    Developing functionalized biomaterials for enhancing transplanted cell engraftment in vivo and stimulating the regeneration of injured tissues requires a multi-disciplinary approach customized for the tissue to be regenerated. In particular, nervous tissue engineering may take a great advantage from the discovery of novel functional motifs fostering transplanted stem cell engraftment and nervous fiber regeneration. Using phage display technology we have discovered new peptide sequences that bind to murine neural stem cell (NSC)-derived neural precursor cells (NPCs), and promote their viability and differentiation in vitro when linked to LDLK12 self-assembling peptide (SAPeptide). We characterized the newly functionalized LDLK12 SAPeptides via atomic force microscopy, circular dichroism and rheology, obtaining nanostructured hydrogels that support human and murine NSC proliferation and differentiation in vitro. One functionalized SAPeptide (Ac-FAQ), showing the highest stem cell viability and neural differentiation in vitro, was finally tested in acute contusive spinal cord injury in rats, where it fostered nervous tissue regrowth and improved locomotor recovery. Interestingly, animals treated with the non-functionalized LDLK12 had an axon sprouting/regeneration intermediate between Ac-FAQ-treated animals and controls. These results suggest that hydrogels functionalized with phage-derived peptides may constitute promising biomimetic scaffolds for in vitro NSC differentiation, as well as regenerative therapy of the injured nervous system. Moreover, this multi-disciplinary approach can be used to customize SAPeptides for other specific tissue engineering applications.Developing functionalized biomaterials for enhancing transplanted cell engraftment in vivo and stimulating the regeneration of injured tissues requires a multi-disciplinary approach customized for the tissue to be regenerated. In particular, nervous tissue engineering may take a great advantage from the

  6. NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole

    2011-01-01

    Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity...... to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs...... associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can...

  7. Mouse transgenesis identifies conserved functional enhancers and cis-regulatory motif in the vertebrate LIM homeobox gene Lhx2 locus.

    Directory of Open Access Journals (Sweden)

    Alison P Lee

    Full Text Available The vertebrate Lhx2 is a member of the LIM homeobox family of transcription factors. It is essential for the normal development of the forebrain, eye, olfactory system and liver as well for the differentiation of lymphoid cells. However, despite the highly restricted spatio-temporal expression pattern of Lhx2, nothing is known about its transcriptional regulation. In mammals and chicken, Crb2, Dennd1a and Lhx2 constitute a conserved linkage block, while the intervening Dennd1a is lost in the fugu Lhx2 locus. To identify functional enhancers of Lhx2, we predicted conserved noncoding elements (CNEs in the human, mouse and fugu Crb2-Lhx2 loci and assayed their function in transgenic mouse at E11.5. Four of the eight CNE constructs tested functioned as tissue-specific enhancers in specific regions of the central nervous system and the dorsal root ganglia (DRG, recapitulating partial and overlapping expression patterns of Lhx2 and Crb2 genes. There was considerable overlap in the expression domains of the CNEs, which suggests that the CNEs are either redundant enhancers or regulating different genes in the locus. Using a large set of CNEs (810 CNEs associated with transcription factor-encoding genes that express predominantly in the central nervous system, we predicted four over-represented 8-mer motifs that are likely to be associated with expression in the central nervous system. Mutation of one of them in a CNE that drove reporter expression in the neural tube and DRG abolished expression in both domains indicating that this motif is essential for expression in these domains. The failure of the four functional enhancers to recapitulate the complete expression pattern of Lhx2 at E11.5 indicates that there must be other Lhx2 enhancers that are either located outside the region investigated or divergent in mammals and fishes. Other approaches such as sequence comparison between multiple mammals are required to identify and characterize such enhancers.

  8. A Heparin Binding Motif Rich in Arginine and Lysine is the Functional Domain of YKL-40

    Directory of Open Access Journals (Sweden)

    Nipaporn Ngernyuang

    2018-02-01

    Full Text Available The heparin-binding glycoprotein YKL-40 (CHI3L1 is intimately associated with microvascularization in multiple human diseases including cancer and inflammation. However, the heparin-binding domain(s pertinent to the angiogenic activity have yet been identified. YKL-40 harbors a consensus heparin-binding motif that consists of positively charged arginine (R and lysine (K (RRDK; residues 144–147; but they don't bind to heparin. Intriguingly, we identified a separate KR-rich domain (residues 334–345 that does display strong heparin binding affinity. A short synthetic peptide spanning this KR-rich domain successfully competed with YKL-40 and blocked its ability to bind heparin. Three individual point mutations, where alanine (A substituted for K or R (K337A, K342A, R344A, led to remarkable decreases in heparin-binding ability and angiogenic activity. In addition, a neutralizing anti-YKL-40 antibody that targets these residues and prevents heparin binding impeded angiogenesis in vitro. MDA-MB-231 breast cancer cells engineered to express ectopic K337A, K342A or R344A mutants displayed reduced tumor development and compromised tumor vessel formation in mice relative to control cells expressing wild-type YKL-40. These data reveal that the KR-rich heparin-binding motif is the functional heparin-binding domain of YKL-40. Our findings shed light on novel molecular mechanisms underlying endothelial cell angiogenesis promoted by YKL-40 in a variety of diseases.

  9. Enantiospecific (+)- and (-)-germacrene D synthases, cloned from goldenrod, reveal a functionally active variant of the universal isoprenoid-biosynthesis aspartate-rich motif.

    Science.gov (United States)

    Prosser, Ian; Altug, Iris G; Phillips, Andy L; König, Wilfried A; Bouwmeester, Harro J; Beale, Michael H

    2004-12-15

    The naturally occurring, volatile sesquiterpene hydrocarbon germacrene D has strong effects on insect behaviour and genes encoding enzymes that produce this compound are of interest in the study of plant-insect interactions and in a number of biotechnological approaches to pest control. Goldenrod, Solidago canadensis, is unusual in that it produces both enantiomers of germacrene D. Two new sesquiterpene synthase cDNAs, designated Sc11 and Sc19, have been isolated from goldenrod and functional expression in Escherichia coli identified Sc11 as (+)-germacrene D synthase and Sc19 as (-)-germacrene D synthase. Thus, the enantiomers of germacrene D are the products of separate, but closely related (85% amino-acid identity), enzymes. Unlike other sesquiterpene synthases and the related monoterpene synthases and prenyl transferases, which contain the characteristic amino-acid motif DDXX(D,E), Sc11 is unusual in that this motif occurs as (303)NDTYD. Mutagenesis of this motif to (303)DDTYD gave rise to an enzyme that fully retained (+)-germacrene D synthase activity. The converse mutation in Sc19 (D303N) resulted in a less efficient but functional enzyme. Mutagenesis of position 303 to glutamate in both enzymes resulted in loss of activity. These results indicate that the magnesium ion-binding role of the first aspartate in the DDXXD motif may not be as critical as previously thought. Further amino-acid sequence comparisons and molecular modelling of the enzyme structures revealed that very subtle changes to the active site of this family of enzymes are required to alter the reaction pathway to form, in this case, different enantiomers from the same enzyme-bound carbocationic intermediate.

  10. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  11. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Science.gov (United States)

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.

  12. AMP-acetyl CoA synthetase from Leishmania donovani: identification and functional analysis of 'PX4GK' motif.

    Science.gov (United States)

    Soumya, Neelagiri; Kumar, I Sravan; Shivaprasad, S; Gorakh, Landage Nitin; Dinesh, Neeradi; Swamy, Kayala Kambagiri; Singh, Sushma

    2015-04-01

    An adenosine monophosphate forming acetyl CoA synthetase (AceCS) which is the key enzyme involved in the conversion of acetate to acetyl CoA has been identified from Leishmania donovani for the first time. Sequence analysis of L. donovani AceCS (LdAceCS) revealed the presence of a 'PX4GK' motif which is highly conserved throughout organisms with higher sequence identity (96%) to lower sequence identity (38%). A ∼ 77 kDa heterologous protein with C-terminal 6X His-tag was expressed in Escherichia coli. Expression of LdAceCS in promastigotes was confirmed by western blot and RT-PCR analysis. Immunolocalization studies revealed that it is a cytosolic protein. We also report the kinetic characterization of recombinant LdAceCS with acetate, adenosine 5'-triphosphate, coenzyme A and propionate as substrates. Site directed mutagenesis of residues in conserved PX4GK motif of LdAceCS was performed to gain insight into its potential role in substrate binding, catalysis and its role in maintaining structural integrity of the protein. P646A, G651A and K652R exhibited more than 90% loss in activity signifying its indispensible role in the enzyme activity. Substitution of other residues in this motif resulted in altered substrate specificity and catalysis. However, none of them had any role in modulation of the secondary structure of the protein except G651A mutant. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

    Science.gov (United States)

    Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

    2015-06-01

    Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. The Regulatory Factor ZFHX3 Modifies Circadian Function in SCN via an AT Motif-Driven Axis

    Science.gov (United States)

    Parsons, Michael J.; Brancaccio, Marco; Sethi, Siddharth; Maywood, Elizabeth S.; Satija, Rahul; Edwards, Jessica K.; Jagannath, Aarti; Couch, Yvonne; Finelli, Mattéa J.; Smyllie, Nicola J.; Esapa, Christopher; Butler, Rachel; Barnard, Alun R.; Chesham, Johanna E.; Saito, Shoko; Joynson, Greg; Wells, Sara; Foster, Russell G.; Oliver, Peter L.; Simon, Michelle M.; Mallon, Ann-Marie; Hastings, Michael H.; Nolan, Patrick M.

    2015-01-01

    Summary We identified a dominant missense mutation in the SCN transcription factor Zfhx3, termed short circuit (Zfhx3Sci), which accelerates circadian locomotor rhythms in mice. ZFHX3 regulates transcription via direct interaction with predicted AT motifs in target genes. The mutant protein has a decreased ability to activate consensus AT motifs in vitro. Using RNA sequencing, we found minimal effects on core clock genes in Zfhx3Sci/+ SCN, whereas the expression of neuropeptides critical for SCN intercellular signaling was significantly disturbed. Moreover, mutant ZFHX3 had a decreased ability to activate AT motifs in the promoters of these neuropeptide genes. Lentiviral transduction of SCN slices showed that the ZFHX3-mediated activation of AT motifs is circadian, with decreased amplitude and robustness of these oscillations in Zfhx3Sci/+ SCN slices. In conclusion, by cloning Zfhx3Sci, we have uncovered a circadian transcriptional axis that determines the period and robustness of behavioral and SCN molecular rhythms. PMID:26232227

  15. DMINDA: an integrated web server for DNA motif identification and analyses.

    Science.gov (United States)

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-07-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Functional annotation from the genome sequence of the giant panda.

    Science.gov (United States)

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-08-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

  17. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  18. Prediction Error During Functional and Non-Functional Action Sequences

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Sørensen, Jesper

    2013-01-01

    recurrent networks were made and the results are presented in this article. The simulations show that non-functional action sequences do indeed increase prediction error, but that context representations, such as abstract goal information, can modulate the error signal considerably. It is also shown...... that the networks are sensitive to boundaries between sequences in both functional and non-functional actions....

  19. Armadillo motifs involved in vesicular transport.

    Directory of Open Access Journals (Sweden)

    Harald Striegl

    Full Text Available Armadillo (ARM repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family.

  20. The N-Terminal GYPSY Motif Is Required for Pilin-Specific Sortase SrtC1 Functionality in Lactobacillus rhamnosus Strain GG.

    Directory of Open Access Journals (Sweden)

    François P Douillard

    Full Text Available Predominantly identified in pathogenic Gram-positive bacteria, sortase-dependent pili are also found in commensal species, such as the probiotic-marketed strain Lactobacillus rhamnosus strain GG. Pili are typically associated with host colonization, immune signalling and biofilm formation. Comparative analysis of the N-terminal domains of pilin-specific sortases from various piliated Gram-positive bacteria identified a conserved motif, called GYPSY, within the signal sequence. We investigated the function and role of the GYPSY residues by directed mutagenesis in homologous (rod-shaped and heterologous (coccoid-shaped expression systems for pilus formation. Substitutions of some of the GYPSY residues, and more specifically the proline residue, were found to have a direct impact on the degree of piliation of Lb. rhamnosus GG. The present findings uncover a new signalling element involved in the functionality of pilin-specific sortases controlling the pilus biogenesis of Lb. rhamnosus GG and related piliated Gram-positive species.

  1. The N-Terminal GYPSY Motif Is Required for Pilin-Specific Sortase SrtC1 Functionality in Lactobacillus rhamnosus Strain GG

    Science.gov (United States)

    Douillard, François P.; Rasinkangas, Pia; Bhattacharjee, Arnab; Palva, Airi; de Vos, Willem M.

    2016-01-01

    Predominantly identified in pathogenic Gram-positive bacteria, sortase-dependent pili are also found in commensal species, such as the probiotic-marketed strain Lactobacillus rhamnosus strain GG. Pili are typically associated with host colonization, immune signalling and biofilm formation. Comparative analysis of the N-terminal domains of pilin-specific sortases from various piliated Gram-positive bacteria identified a conserved motif, called GYPSY, within the signal sequence. We investigated the function and role of the GYPSY residues by directed mutagenesis in homologous (rod-shaped) and heterologous (coccoid-shaped) expression systems for pilus formation. Substitutions of some of the GYPSY residues, and more specifically the proline residue, were found to have a direct impact on the degree of piliation of Lb. rhamnosus GG. The present findings uncover a new signalling element involved in the functionality of pilin-specific sortases controlling the pilus biogenesis of Lb. rhamnosus GG and related piliated Gram-positive species. PMID:27070897

  2. Localization of Daucus carota NMCP1 to the nuclear periphery: the role of the N-terminal region and an NLS-linked sequence motif, RYNLRR, in the tail domain

    Directory of Open Access Journals (Sweden)

    Yuta eKimura

    2014-02-01

    Full Text Available Recent ultrastructural studies revealed that a structure similar to the vertebrate nuclear lamina exists in the nuclei of higher plants. However, plant genomes lack genes for lamins and intermediate-type filament proteins, and this suggests that plant-specific nuclear coiled-coil proteins make up the lamina-like structure in plants. NMCP1 is a protein, first identified in Daucus carota cells, that localizes exclusively to the nuclear periphery in interphase cells. It has a tripartite structure comprised of head, rod, and tail domains, and includes putative nuclear localization signal (NLS motifs. We identified the functional NLS of DcNMCP1 (carrot NMCP1 and determined the protein regions required for localizing to the nuclear periphery using EGFP-fused constructs transiently expressed in Apium graveolens epidermal cells. Transcription was driven under a CaMV35S promoter, and the genes were introduced into the epidermal cells by a DNA-coated microprojectile delivery system. Of the NLS motifs, KRRRK and RRHK in the tail domain were highly functional for nuclear localization. Addition of the N-terminal 141 amino acids from DcNMCP1 shifted the localization of a region including these NLSs from the entire nucleus to the nuclear periphery. Using this same construct, the replacement of amino acids in RRHK or its preceding sequence, YNL, with alanine residues abolished localization to the nuclear periphery, while replacement of KRRRK did not affect localization. The sequence R/Q/HYNLRR/H, including YNL and the first part of the sequence of RRHK, is evolutionarily conserved in a subclass of NMCP1 sequences from many plant species. These results show that NMCP1 localizes to the nuclear periphery by a combined action of a sequence composed of R/Q/HYNLRR/H, NLS, and the N-terminal region including the head and a portion of the rod domain, suggesting that more than one binding site is implicated in localization of NMCP1.

  3. Functional motifs responsible for human metapneumovirus M2-2-mediated innate immune evasion.

    Science.gov (United States)

    Chen, Yu; Deng, Xiaoling; Deng, Junfang; Zhou, Jiehua; Ren, Yuping; Liu, Shengxuan; Prusak, Deborah J; Wood, Thomas G; Bao, Xiaoyong

    2016-12-01

    Human metapneumovirus (hMPV) is a major cause of lower respiratory infection in young children. Repeated infections occur throughout life, but its immune evasion mechanisms are largely unknown. We recently found that hMPV M2-2 protein elicits immune evasion by targeting mitochondrial antiviral-signaling protein (MAVS), an antiviral signaling molecule. However, the molecular mechanisms underlying such inhibition are not known. Our mutagenesis studies revealed that PDZ-binding motifs, 29-DEMI-32 and 39-KEALSDGI-46, located in an immune inhibitory region of M2-2, are responsible for M2-2-mediated immune evasion. We also found both motifs prevent TRAF5 and TRAF6, the MAVS downstream adaptors, to be recruited to MAVS, while the motif 39-KEALSDGI-46 also blocks TRAF3 migrating to MAVS. In parallel, these TRAFs are important in activating transcription factors NF-kB and/or IRF-3 by hMPV. Our findings collectively demonstrate that M2-2 uses its PDZ motifs to launch the hMPV immune evasion through blocking the interaction of MAVS and its downstream TRAFs. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Fibonacci difference sequence spaces for modulus functions

    Directory of Open Access Journals (Sweden)

    Kuldip Raj

    2015-05-01

    Full Text Available In the present paper we introduce Fibonacci difference sequence spaces l(F, Ƒ, p, u and  l_∞(F, Ƒ, p, u by using a sequence of modulus functions and a new band matrix F. We also make an effort to study some inclusion relations, topological and geometric properties of these spaces. Furthermore, the alpha, beta, gamma duals and matrix transformation of the space l(F, Ƒ, p, u are determined.

  5. The NS1 polypeptide of the murine parvovirus minute virus of mice binds to DNA sequences containing the motif [ACCA]2-3.

    Science.gov (United States)

    Cotmore, S F; Christensen, J; Nüesch, J P; Tattersall, P

    1995-03-01

    A DNA fragment containing the minute virus of mice 3' replication origin was specifically coprecipitated in immune complexes containing the virally coded NS1, but not the NS2, polypeptide. Antibodies directed against the amino- or carboxy-terminal regions of NS1 precipitated the NS1-origin complexes, but antibodies directed against NS1 amino acids 284 to 459 blocked complex formation. Using affinity-purified histidine-tagged NS1 preparations, we have shown that the specific protein-DNA interaction is of moderate affinity, being stable in 0.1 M salt but rapidly lost at higher salt concentrations. In contrast, generalized (or nonspecific) DNA binding by NS1 could be demonstrated only in low salt. Addition of ATP or gamma S-ATP enhanced specific DNA binding by wild-type NS1 severalfold, but binding was lost under conditions which favored ATP hydrolysis. NS1 molecules with mutations in a critical lysine residue (amino acid 405) in the consensus ATP-binding site bound to the origin, but this binding could not be enhanced by ATP addition. DNase I protection assays carried out with wild-type NS1 in the presence of gamma S-ATP gave footprints which extended over 43 nucleotides on both DNA strands, from the middle of the origin bubble sequence to a position some 14 bp beyond the nick site. The DNA-binding site for NS1 was mapped to a 22-bp fragment from the middle of the 3' replication origin which contains the sequence ACCAACCA. This conforms to a reiterated motif (ACCA)2-3, which occurs, in more or less degenerate form, at many sites throughout the minute virus of mice genome (J. W. Bodner, Virus Genes 2:167-182, 1989). Insertion of a single copy of the sequence (ACCA)3 was shown to be sufficient to confer NS1 binding on an otherwise unrecognized plasmid fragment. The functions of NS1 in the viral life cycle are reevaluated in the light of this result.

  6. Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

    KAUST Repository

    Sayadi, Ahmed; Briganti, Leonardo; Tramontano, Anna; Via, Allegra

    2011-01-01

    The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length

  7. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  8. Sequence motif upstream of the Hendra virus fusion protein cleavage site is not sufficient to promote efficient proteolytic processing

    International Nuclear Information System (INIS)

    Craft, Willie Warren; Dutch, Rebecca Ellis

    2005-01-01

    The Hendra virus fusion (HeV F) protein is synthesized as a precursor, F 0 , and proteolytically cleaved into the mature F 1 and F 2 heterodimer, following an HDLVDGVK 109 motif. This cleavage event is required for fusogenic activity. To determine the amino acid requirements for processing of the HeV F protein, we constructed multiple mutants. Individual and simultaneous alanine substitutions of the eight residues immediately upstream of the cleavage site did not eliminate processing. A chimeric SV5 F protein in which the furin site was substituted for the VDGVK 109 motif of the HeV F protein was not processed but was expressed on the cell surface. Another chimeric SV5 F protein containing the HDLVDGVK 109 motif of the HeV F protein underwent partial cleavage. These data indicate that the upstream region can play a role in protease recognition, but is neither absolutely required nor sufficient for efficient processing of the HeV F protein

  9. Pyrene functionalized molecular beacon with pH-sensitive i-motif in a loop.

    Science.gov (United States)

    Dembska, Anna; Juskowiak, Bernard

    2015-01-01

    In this work, we present a spectral characterization of pH-sensitive system, which combines the i-motif properties with the spatially sensitive fluorescence signal of pyrene molecules attached to hairpin ends. The excimer production (fluorescence max. ∼480 nm) by pyrene labels at the ends of the molecular beacon is driven by pH-dependent i-motif formation in the loop. To illustrate the performance and reversible work of our systems, we performed the experiments with repeatedly pH cycling between pH values of 7.5±0.3 and 6.5±0.3. The sensor gives analytical response in excimer-monomer switching mode in narrow pH range (1.5 pH units) and exhibits high pH resolution (0.1 pH unit). Copyright © 2015 Elsevier B.V. All rights reserved.

  10. RNA motif search with data-driven element ordering.

    Science.gov (United States)

    Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

    2016-05-18

    In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .

  11. Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases

    Directory of Open Access Journals (Sweden)

    Braun Werner

    2002-11-01

    Full Text Available Abstract Background Total sequence decomposition, using the web-based MASIA tool, identifies areas of conservation in aligned protein sequences. By structurally annotating these motifs, the sequence can be parsed into individual building blocks, molecular legos ("molegos", that can eventually be related to function. Here, the approach is applied to the apurinic/apyrimidinic endonuclease (APE DNA repair proteins, essential enzymes that have been highly conserved throughout evolution. The APEs, DNase-1 and inositol 5'-polyphosphate phosphatases (IPP form a superfamily that catalyze metal ion based phosphorolysis, but recognize different substrates. Results MASIA decomposition of APE yielded 12 sequence motifs, 10 of which are also structurally conserved within the family and are designated as molegos. The 12 motifs include all the residues known to be essential for DNA cleavage by APE. Five of these molegos are sequentially and structurally conserved in DNase-1 and the IPP family. Correcting the sequence alignment to match the residues at the ends of two of the molegos that are absolutely conserved in each of the three families greatly improved the local structural alignment of APEs, DNase-1 and synaptojanin. Comparing substrate/product binding of molegos common to DNase-1 showed that those distinctive for APEs are not directly involved in cleavage, but establish protein-DNA interactions 3' to the abasic site. These additional bonds enhance both specific binding to damaged DNA and the processivity of APE1. Conclusion A modular approach can improve structurally predictive alignments of homologous proteins with low sequence identity and reveal residues peripheral to the traditional "active site" that control the specificity of enzymatic activity.

  12. Direct AUC optimization of regulatory motifs.

    Science.gov (United States)

    Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

    2017-07-15

    The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  13. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  14. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins.

    Science.gov (United States)

    Foulk, Michael S; Urban, John M; Casella, Cinzia; Gerbi, Susan A

    2015-05-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na(+) instead of K(+) in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq. © 2015 Foulk et al.; Published by Cold Spring Harbor Laboratory Press.

  15. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps

    Science.gov (United States)

    Drory Retwitzer, Matan; Polishchuk, Maya; Churkin, Elena; Kifer, Ilona; Yakhini, Zohar; Barash, Danny

    2015-01-01

    Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular—no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site. PMID:25940619

  16. Cations form sequence selective motifs within DNA grooves via a combination of cation-pi and ion-dipole/hydrogen bond interactions.

    Science.gov (United States)

    Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori

    2013-01-01

    The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl⁺) and the polarized first hydration shell waters of divalent cations (Mg²⁺, Ca²⁺) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves.

  17. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  18. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  19. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage...

  20. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    Science.gov (United States)

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of

  1. Molecular dynamics analysis of stabilities of the telomeric Watson-Crick duplex and the associated i-motif as a function of pH and temperature.

    Science.gov (United States)

    Panczyk, Tomasz; Wolski, Pawel

    2018-06-01

    This work deals with a molecular dynamics analysis of the protonated and deprotonated states of the natural sequence d[(CCCTAA) 3 CCCT] of the telomeric DNA forming the intercalated i-motif or paired with the sequence d[(CCCTAA) 3 CCCT] and forming the Watson-Crick (WC) duplex. By utilizing the amber force field for nucleic acids we built the i-motif and the WC duplex either with native cytosines or using their protonated forms. We studied, by applying molecular dynamics simulations, the role of hydrogen bonds between cytosines or in cytosine-guanine pairs in the stabilization of both structures in the physiological fluid. We found that hydrogen bonds exist in the case of protonated i-motif and in the standard form of the WC duplex. They, however, vanish in the case of the deprotonated i-motif and protonated form of the WC duplex. By determining potentials of mean force in the enforced unwrapping of these structures we found that the protonated i-motif is thermodynamically the most stable. Its deprotonation leads to spontaneous and observed directly in the unbiased calculations unfolding of the i-motif to the hairpin structure at normal temperature. The WC duplex is stable in its standard form and its slight destabilization is observed at the acidic pH. However, the protonated WC duplex unwraps very slowly at 310 K and its decomposition was not observed in the unbiased calculations. At higher temperatures (ca. 400 K or more) the WC duplex unwraps spontaneously. Copyright © 2018. Published by Elsevier B.V.

  2. Functional Interaction of the Adenovirus IVa2 Protein with Adenovirus Type 5 Packaging Sequences

    OpenAIRE

    Ostapchuk, Philomena; Yang, Jihong; Auffarth, Ece; Hearing, Patrick

    2005-01-01

    Adenovirus type 5 (Ad5) DNA packaging is initiated in a polar fashion from the left end of the genome. The packaging process is dependent on the cis-acting packaging domain located between nucleotides 230 and 380. Seven AT-rich repeats that direct packaging have been identified within this domain. A1, A2, A5, and A6 are the most important repeats functionally and share a bipartite sequence motif. Several lines of evidence suggest that there is a limiting trans-acting factor(s) that plays a ro...

  3. Effects of chemokine (C–C motif) ligand 1 on microglial function

    International Nuclear Information System (INIS)

    Akimoto, Nozomi; Ifuku, Masataka; Mori, Yuki; Noda, Mami

    2013-01-01

    Highlights: •CCR8, a specific receptor for CCL-1, was expressed on primary cultured microglia. •Expression of CCR-8 in microglia was upregulated in the presence of CCL-1. •CCL-1 increased motility, proliferation and phagocytosis of cultured microglia. •CCL-1promoted BDNF and IL-6 mRNA, and the release of NO from microglia. •CCL-1 activates microglia and may contribute to the development of neuropathic pain. -- Abstract: Microglia, which constitute the resident macrophages of the central nervous system (CNS), are generally considered as the primary immune cells in the brain and spinal cord. Microglial cells respond to various factors which are produced following nerve injury of multiple aetiologies and contribute to the development of neuronal disease. Chemokine (C–C motif) ligand 1 (CCL-1), a well-characterized chemokine secreted by activated T cells, has been shown to play an important role in neuropathic pain induced by nerve injury and is also produced in various cell types in the CNS, especially in dorsal root ganglia (DRG). However, the role of CCL-1 in the CNS and the effects on microglia remains unclear. Here we showed the multiple effects of CCL-1 on microglia. We first showed that CCR-8, a specific receptor for CCL-1, was expressed on primary cultured microglia, as well as on astrocytes and neurons, and was upregulated in the presence of CCL-1. CCL-1 at concentration of 1 ng/ml induced chemotaxis, increased motility at a higher concentration (100 ng/ml), and increased proliferation and phagocytosis of cultured microglia. CCL-1 also activated microglia morphologically, promoted mRNA levels for brain-derived neurotrophic factor (BDNF) and IL-6, and increased the release of nitrite from microglia. These indicate that CCL-1 has a role as a mediator in neuron-glia interaction, which may contribute to the development of neurological diseases, especially in neuropathic pain

  4. Effects of chemokine (C–C motif) ligand 1 on microglial function

    Energy Technology Data Exchange (ETDEWEB)

    Akimoto, Nozomi [Laboratory of Pathophysiology, Graduate School of Pharmaceutical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582 (Japan); Ifuku, Masataka [Laboratory of Integrative Physiology, Graduate School of Medicine, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582 (Japan); Mori, Yuki [Laboratory of Pathophysiology, Graduate School of Pharmaceutical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582 (Japan); Noda, Mami, E-mail: noda@phar.kyushu-u.ac.jp [Laboratory of Pathophysiology, Graduate School of Pharmaceutical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582 (Japan)

    2013-07-05

    Highlights: •CCR8, a specific receptor for CCL-1, was expressed on primary cultured microglia. •Expression of CCR-8 in microglia was upregulated in the presence of CCL-1. •CCL-1 increased motility, proliferation and phagocytosis of cultured microglia. •CCL-1promoted BDNF and IL-6 mRNA, and the release of NO from microglia. •CCL-1 activates microglia and may contribute to the development of neuropathic pain. -- Abstract: Microglia, which constitute the resident macrophages of the central nervous system (CNS), are generally considered as the primary immune cells in the brain and spinal cord. Microglial cells respond to various factors which are produced following nerve injury of multiple aetiologies and contribute to the development of neuronal disease. Chemokine (C–C motif) ligand 1 (CCL-1), a well-characterized chemokine secreted by activated T cells, has been shown to play an important role in neuropathic pain induced by nerve injury and is also produced in various cell types in the CNS, especially in dorsal root ganglia (DRG). However, the role of CCL-1 in the CNS and the effects on microglia remains unclear. Here we showed the multiple effects of CCL-1 on microglia. We first showed that CCR-8, a specific receptor for CCL-1, was expressed on primary cultured microglia, as well as on astrocytes and neurons, and was upregulated in the presence of CCL-1. CCL-1 at concentration of 1 ng/ml induced chemotaxis, increased motility at a higher concentration (100 ng/ml), and increased proliferation and phagocytosis of cultured microglia. CCL-1 also activated microglia morphologically, promoted mRNA levels for brain-derived neurotrophic factor (BDNF) and IL-6, and increased the release of nitrite from microglia. These indicate that CCL-1 has a role as a mediator in neuron-glia interaction, which may contribute to the development of neurological diseases, especially in neuropathic pain.

  5. Spontaneous processing of functional and non-functional action sequences

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Sørensen, Jesper

    2011-01-01

    as sub-categories of non-functional behavior (i.e., actions lacking causal coherence and a necessary integration between subparts). New insights in human action processing can help us explain how cognition might vary depending on the type of behavior processed. Using an event segmentation paradigm, we...... conducted two experiments eliciting differences in participants' response patterns to functional and non-functional actions. Participants consistently segmented non-functional action sequences into smaller units indicating either an attentional shift to the level of gesture analysis or a problem...... of representational integration. Experimental studies of non-functional behavior can strengthen explanations of recurrent features of human action processing, such as ritual and ritualized behavior, as well as indicate potential sources and effects of breakdown of the system....

  6. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

    DEFF Research Database (Denmark)

    Foulk, M. S.; Urban, J. M.; Casella, Cinzia

    2015-01-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (lambda-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent...... strands intact. We used genomics and biochemical approaches to determine if lambda-exo digests all parental DNA sequences equally. We report that lambda-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, lambda-exo digestion of nonreplicating genomic DNA (LexoG0) enriches...... GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent lambda-exo biases in NSseq and validated this approach at the rDNA locus. The lambda-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s...

  7. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  8. Discovery of novel interacting partners of PSMD9, a proteasomal chaperone: Role of an Atypical and versatile PDZ-domain motif interaction and identification of putative functional modules

    Directory of Open Access Journals (Sweden)

    Nikhil Sangith

    2014-01-01

    Full Text Available PSMD9 (Proteasome Macropain non-ATPase subunit 9, a proteasomal assembly chaperone, harbors an uncharacterized PDZ-like domain. Here we report the identification of five novel interacting partners of PSMD9 and provide the first glimpse at the structure of the PDZ-domain, including the molecular details of the interaction. We based our strategy on two propositions: (a proteins with conserved C-termini may share common functions and (b PDZ domains interact with C-terminal residues of proteins. Screening of C-terminal peptides followed by interactions using full-length recombinant proteins, we discovered hnRNPA1 (an RNA binding protein, S14 (a ribosomal protein, CSH1 (a growth hormone, E12 (a transcription factor and IL6 receptor as novel PSMD9-interacting partners. Through multiple techniques and structural insights, we clearly demonstrate for the first time that human PDZ domain interacts with the predicted Short Linear Sequence Motif (SLIM at the C-termini of the client proteins. These interactions are also recapitulated in mammalian cells. Together, these results are suggestive of the role of PSMD9 in transcriptional regulation, mRNA processing and editing, hormone and receptor activity and protein translation. Our proof-of-principle experiments endorse a novel and quick method for the identification of putative interacting partners of similar PDZ-domain proteins from the proteome and for discovering novel functions.

  9. Evolutionarily conserved bias of amino-acid usage refines the definition of PDZ-binding motif

    Directory of Open Access Journals (Sweden)

    Launey Thomas

    2011-06-01

    Full Text Available Abstract Background The interactions between PDZ (PSD-95, Dlg, ZO-1 domains and PDZ-binding motifs play central roles in signal transductions within cells. Proteins with PDZ domains bind to PDZ-binding motifs almost exclusively when the motifs are located at the carboxyl (C- terminal ends of their binding partners. However, it remains little explored whether PDZ-binding motifs show any preferential location at the C-terminal ends of proteins, at genome-level. Results Here, we examined the distribution of the type-I (x-x-S/T-x-I/L/V or type-II (x-x-V-x-I/V PDZ-binding motifs in proteins encoded in the genomes of five different species (human, mouse, zebrafish, fruit fly and nematode. We first established that these PDZ-binding motifs are indeed preferentially present at their C-terminal ends. Moreover, we found specific amino acid (AA bias for the 'x' positions in the motifs at the C-terminal ends. In general, hydrophilic AAs were favored. Our genomics-based findings confirm and largely extend the results of previous interaction-based studies, allowing us to propose refined consensus sequences for all of the examined PDZ-binding motifs. An ontological analysis revealed that the refined motifs are functionally relevant since a large fraction of the proteins bearing the motif appear to be involved in signal transduction. Furthermore, co-precipitation experiments confirmed two new protein interactions predicted by our genomics-based approach. Finally, we show that influenza virus pathogenicity can be correlated with PDZ-binding motif, with high-virulence viral proteins bearing a refined PDZ-binding motif. Conclusions Our refined definition of PDZ-binding motifs should provide important clues for identifying functional PDZ-binding motifs and proteins involved in signal transduction.

  10. Functional identification of a Lippia dulcis bornyl diphosphate synthase that contains a duplicated, inhibitory arginine-rich motif.

    Science.gov (United States)

    Hurd, Matthew C; Kwon, Moonhyuk; Ro, Dae-Kyun

    2017-08-26

    Lippia dulcis (Aztec sweet herb) contains the potent natural sweetener hernandulcin, a sesquiterpene ketone found in the leaves and flowers. Utilizing the leaves for agricultural application is challenging due to the presence of the bitter-tasting and toxic monoterpene, camphor. To unlock the commercial potential of L. dulcis leaves, the first step of camphor biosynthesis by a bornyl diphosphate synthase needs to be elucidated. Two putative monoterpene synthases (LdTPS3 and LdTPS9) were isolated from L. dulcis leaf cDNA. To elucidate their catalytic functions, E. coli-produced recombinant enzymes with truncations of their chloroplast transit peptides were assayed with geranyl diphosphate (GPP). In vitro enzyme assays showed that LdTPS3 encodes bornyl diphosphate synthase (thus named LdBPPS) while LdTPS9 encodes linalool synthase. Interestingly, the N-terminus of LdBPPS possesses two arginine-rich (RRX 8 W) motifs, and enzyme assays showed that the presence of both RRX 8 W motifs completely inhibits the catalytic activity of LdBPPS. Only after the removal of the putative chloroplast transit peptide and the first RRX 8 W, LdBPPS could react with GPP to produce bornyl diphosphate. LdBPPS is distantly related to the known bornyl diphosphate synthase from sage in a phylogenetic analysis, indicating a converged evolution of camphor biosynthesis in sage and L. dulcis. The discovery of LdBPPS opens up the possibility of engineering L. dulcis to remove the undesirable product, camphor. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    Science.gov (United States)

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  12. Automatic discovery of cross-family sequence features associated with protein function

    Directory of Open Access Journals (Sweden)

    Krings Andrea

    2006-01-01

    Full Text Available Abstract Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for

  13. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  14. Measurement of creatinine in human plasma using a functional porous polymer structure sensing motif

    Science.gov (United States)

    Nanda, Sitansu Sekhar; An, Seong Soo A; Yi, Dong Kee

    2015-01-01

    In this study, a new method for detecting creatinine was developed. This novel sensor comprised of two ionic liquids, poly-lactic-co-glycolic acid (PLGA) and 1-butyl-3-methylimidazolium (BMIM) chloride, in the presence of 2′,7′-dichlorofluorescein diacetate (DCFH-DA). PLGA and BMIM chloride formed a functional porous polymer structure (FPPS)-like structure. Creatinine within the FPPS rapidly hydrolyzed and released OH−, which in turn converted DCFH-DA to DCFH, developing an intense green color or green fluorescence. The conversion of DCFH to DCF+ resulted in swelling of FPPS and increased solubility. This DCF+-based sensor could detect creatinine levels with detection limit of 5 µM and also measure the creatinine in blood. This novel method could be used in diagnostic applications for monitoring individuals with renal dysfunction. PMID:26347475

  15. Laser spectroscopic and theoretical studies of the structures and encapsulation motifs of functional molecules

    Energy Technology Data Exchange (ETDEWEB)

    Ebata, Takayuki; Kusaka, Ryoji [Department of Chemistry, Graduate School of Science, Hiroshima University, Kagamiyama 1-3-1, Higashi-Hiroshima, 739-8526 (Japan); Xantheas, Sotiris S. [Chemical and Materials Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, MS K1-83, Richland, WA 99352 (United States)

    2015-01-22

    Extensive laser spectroscopic and theoretical studies have been recently carried out with the aim to reveal the structure and dynamics of encapsulation complexes in the gas phase. The characteristics of the encapsulation complexes are governed by the fact that (i) most of the host molecules are flexible and (ii) the complexes form high dimensional structures by using weak non-covalent interactions. These characteristics result in the possibility of the coexistence of many conformers in close energetic proximity. The combination of supersonic jet/laser spectroscopy and high level quantum chemical calculations is essential in tackling these challenging problems. In this report we describe our recent studies on the structures and dynamics of the encapsulation complexes formed by calix[4]arene (C4A), dibenzo-18-crown-6-ether (DB18C6), and benzo-18-crown-6-ether (B18C6) 'hosts' interacting with N{sub 2}, acetylene, water, and ammonia 'guest' molecules. The gaseous host-guest complexes are generated under jet-cooled conditions. We apply various laser spectroscopic methods to obtain the conformer- and isomer-specified electronic and IR spectra. The experimental results are complemented with quantum chemical calculations ranging from density functional theory to high level first principles calculations at the MP2 and CCSD(T) levels of theory. We discuss the possible conformations of the bare host molecules, the structural changes they undergo upon complexation, and the key interactions that are responsible in stabilizing the specific complexes.

  16. Karyological characterization and identification of four repetitive element groups (the 18S – 28S rRNA gene, telomeric sequences, microsatellite repeat motifs, Rex retroelements) of the Asian swamp eel (Monopterus albus)

    Science.gov (United States)

    Suntronpong, Aorarat; Thapana, Watcharaporn; Twilprawat, Panupon; Prakhongcheep, Ornjira; Somyong, Suthasinee; Muangmai, Narongrit; Surin Peyachoknagul; Srikulnath, Kornsorn

    2017-01-01

    Abstract Among teleost fishes, Asian swamp eel (Monopterus albus Zuiew, 1793) possesses the lowest chromosome number, 2n = 24. To characterize the chromosome constitution and investigate the genome organization of repetitive sequences in M. albus, karyotyping and chromosome mapping were performed with the 18S – 28S rRNA gene, telomeric repeats, microsatellite repeat motifs, and Rex retroelements. The 18S – 28S rRNA genes were observed to the pericentromeric region of chromosome 4 at the same position with large propidium iodide and C-positive bands, suggesting that the molecular structure of the pericentromeric regions of chromosome 4 has evolved in a concerted manner with amplification of the 18S – 28S rRNA genes. (TTAGGG)n sequences were found at the telomeric ends of all chromosomes. Eight of 19 microsatellite repeat motifs were dispersedly mapped on different chromosomes suggesting the independent amplification of microsatellite repeat motifs in M. albus. Monopterus albus Rex1 (MALRex1) was observed at interstitial sites of all chromosomes and in the pericentromeric regions of most chromosomes whereas MALRex3 was scattered and localized to all chromosomes and MALRex6 to several chromosomes. This suggests that these retroelements were independently amplified or lost in M. albus. Among MALRexs (MALRex1, MALRex3, and MALRex6), MALRex6 showed higher interspecific sequence divergences from other teleost species in comparison. This suggests that the divergence of Rex6 sequences of M. albus might have occurred a relatively long time ago. PMID:29093797

  17. Noroviruses Co-opt the Function of Host Proteins VAPA and VAPB for Replication via a Phenylalanine-Phenylalanine-Acidic-Tract-Motif Mimic in Nonstructural Viral Protein NS1/2.

    Science.gov (United States)

    McCune, Broc T; Tang, Wei; Lu, Jia; Eaglesham, James B; Thorne, Lucy; Mayer, Anne E; Condiff, Emily; Nice, Timothy J; Goodfellow, Ian; Krezel, Andrzej M; Virgin, Herbert W

    2017-07-11

    VAPA host protein. The NS1/2-VAPA interaction is conserved between murine and human noroviruses and was important for early steps in murine norovirus replication. Using structure-function analysis, we found that NS1/2 contains a short sequence that molecularly mimics the FFAT motif that is found in multiple host proteins that bind VAPA. This represents to our knowledge the first example of functionally important mimicry of a host FFAT motif by a microbial protein. Copyright © 2017 McCune et al.

  18. Adenovirus fibre shaft sequences fold into the native triple beta-spiral fold when N-terminally fused to the bacteriophage T4 fibritin foldon trimerisation motif.

    Science.gov (United States)

    Papanikolopoulou, Katerina; Teixeira, Susana; Belrhali, Hassan; Forsyth, V Trevor; Mitraki, Anna; van Raaij, Mark J

    2004-09-03

    Adenovirus fibres are trimeric proteins that consist of a globular C-terminal domain, a central fibrous shaft and an N-terminal part that attaches to the viral capsid. In the presence of the globular C-terminal domain, which is necessary for correct trimerisation, the shaft segment adopts a triple beta-spiral conformation. We have replaced the head of the fibre by the trimerisation domain of the bacteriophage T4 fibritin, the foldon. Two different fusion constructs were made and crystallised, one with an eight amino acid residue linker and one with a linker of only two residues. X-ray crystallographic studies of both fusion proteins shows that residues 319-391 of the adenovirus type 2 fibre shaft fold into a triple beta-spiral fold indistinguishable from the native structure, although this is now resolved at a higher resolution of 1.9 A. The foldon residues 458-483 also adopt their natural structure. The intervening linkers are not well ordered in the crystal structures. This work shows that the shaft sequences retain their capacity to fold into their native beta-spiral fibrous fold when fused to a foreign C-terminal trimerisation motif. It provides a structural basis to artificially trimerise longer adenovirus shaft segments and segments from other trimeric beta-structured fibre proteins. Such artificial fibrous constructs, amenable to crystallisation and solution studies, can offer tractable model systems for the study of beta-fibrous structure. They can also prove useful for gene therapy and fibre engineering applications.

  19. Lessons from a tarantula: new insights into muscle thick filament and myosin interacting-heads motif structure and function.

    Science.gov (United States)

    Alamo, Lorenzo; Koubassova, Natalia; Pinto, Antonio; Gillilan, Richard; Tsaturyan, Andrey; Padrón, Raúl

    2017-10-01

    The tarantula skeletal muscle X-ray diffraction pattern suggested that the myosin heads were helically arranged on the thick filaments. Electron microscopy (EM) of negatively stained relaxed tarantula thick filaments revealed four helices of heads allowing a helical 3D reconstruction. Due to its low resolution (5.0 nm), the unambiguous interpretation of densities of both heads was not possible. A resolution increase up to 2.5 nm, achieved by cryo-EM of frozen-hydrated relaxed thick filaments and an iterative helical real space reconstruction, allowed the resolving of both heads. The two heads, "free" and "blocked", formed an asymmetric structure named the "interacting-heads motif" (IHM) which explained relaxation by self-inhibition of both heads ATPases. This finding made tarantula an exemplar system for thick filament structure and function studies. Heads were shown to be released and disordered by Ca 2+ -activation through myosin regulatory light chain phosphorylation, leading to EM, small angle X-ray diffraction and scattering, and spectroscopic and biochemical studies of the IHM structure and function. The results from these studies have consequent implications for understanding and explaining myosin super-relaxed state and thick filament activation and regulation. A cooperative phosphorylation mechanism for activation in tarantula skeletal muscle, involving swaying constitutively Ser35 mono-phosphorylated free heads, explains super-relaxation, force potentiation and post-tetanic potentiation through Ser45 mono-phosphorylated blocked heads. Based on this mechanism, we propose a swaying-swinging, tilting crossbridge-sliding filament for tarantula muscle contraction.

  20. Function-Based Algorithms for Biological Sequences

    Science.gov (United States)

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  1. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  2. Structural and functional analysis of VQ motif-containing proteins in Arabidopsis as interacting proteins of WRKY transcription factors.

    Science.gov (United States)

    Cheng, Yuan; Zhou, Yuan; Yang, Yan; Chi, Ying-Jun; Zhou, Jie; Chen, Jian-Ye; Wang, Fei; Fan, Baofang; Shi, Kai; Zhou, Yan-Hong; Yu, Jing-Quan; Chen, Zhixiang

    2012-06-01

    WRKY transcription factors are encoded by a large gene superfamily with a broad range of roles in plants. Recently, several groups have reported that proteins containing a short VQ (FxxxVQxLTG) motif interact with WRKY proteins. We have recently discovered that two VQ proteins from Arabidopsis (Arabidopsis thaliana), SIGMA FACTOR-INTERACTING PROTEIN1 and SIGMA FACTOR-INTERACTING PROTEIN2, act as coactivators of WRKY33 in plant defense by specifically recognizing the C-terminal WRKY domain and stimulating the DNA-binding activity of WRKY33. In this study, we have analyzed the entire family of 34 structurally divergent VQ proteins from Arabidopsis. Yeast (Saccharomyces cerevisiae) two-hybrid assays showed that Arabidopsis VQ proteins interacted specifically with the C-terminal WRKY domains of group I and the sole WRKY domains of group IIc WRKY proteins. Using site-directed mutagenesis, we identified structural features of these two closely related groups of WRKY domains that are critical for interaction with VQ proteins. Quantitative reverse transcription polymerase chain reaction revealed that expression of a majority of Arabidopsis VQ genes was responsive to pathogen infection and salicylic acid treatment. Functional analysis using both knockout mutants and overexpression lines revealed strong phenotypes in growth, development, and susceptibility to pathogen infection. Altered phenotypes were substantially enhanced through cooverexpression of genes encoding interacting VQ and WRKY proteins. These findings indicate that VQ proteins play an important role in plant growth, development, and response to environmental conditions, most likely by acting as cofactors of group I and IIc WRKY transcription factors.

  3. Determining and comparing protein function in Bacterial genome sequences

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla

    of this class have very little homology to other known genomes making functional annotation based on sequence similarity very difficult. Inspired in part by this analysis, an approach for comparative functional annotation was created based public sequenced genomes, CMGfunc. Functionally related groups......In November 2013, there was around 21.000 different prokaryotic genomes sequenced and publicly available, and the number is growing daily with another 20.000 or more genomes expected to be sequenced and deposited by the end of 2014. An important part of the analysis of this data is the functional...... annotation of genes – the descriptions assigned to genes that describe the likely function of the encoded proteins. This process is limited by several factors, including the definition of a function which can be more or less specific as well as how many genes can actually be assigned a function based...

  4. ATP-binding motifs play key roles in Krp1p, kinesin-related protein 1, function for bi-polar growth control in fission yeast

    International Nuclear Information System (INIS)

    Rhee, Dong Keun; Cho, Bon A; Kim, Hyong Bai

    2005-01-01

    Kinesin is a microtubule-based motor protein with various functions related to the cell growth and division. It has been reported that Krp1p, kinesin-related protein 1, which belongs to the kinesin heavy chain superfamily, localizes on microtubules and may play an important role in cytokinesis. However, the function of Krp1p has not been fully elucidated. In this study, we overexpressed an intact form and three different mutant forms of Krp1p in fission yeast constructed by site-directed mutagenesis in two ATP-binding motifs or by truncation of the leucine zipper-like motif (LZiP). We observed hyper-extended microtubules and the aberrant nuclear shape in Krp1p-overexpressed fission yeast. As a functional consequence, a point mutation of ATP-binding domain 1 (G89E) in Krp1p reversed the effect of Krp1p overexpression in fission yeast, whereas the specific mutation in ATP-binding domain 2 (G238E) resulted in the altered cell polarity. Additionally, truncation of the leucine zipper-like domain (LZiP) at the C-terminal of Krp1p showed a normal nuclear division. Taken together, we suggest that krp1p is involved in regulation of cell-polarized growth through ATP-binding motifs in fission yeast

  5. Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling

    OpenAIRE

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the sta...

  6. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

    Science.gov (United States)

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-07-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.

  7. Convulxin, a C-type lectin-like protein, inhibits HCASMCs functions via WAD-motif/integrin-αv interaction and NF-κB-independent gene suppression of GRO and IL-8

    Energy Technology Data Exchange (ETDEWEB)

    Shih, Chun-Ho; Chiang, Tin-Bin [Chang Gung University of Science and Technology, Guishan Dist., Taoyuan City, Taiwan (China); Wang, Wen-Jeng, E-mail: wjwang@mail.cgust.edu.tw [Chang Gung University of Science and Technology, Guishan Dist., Taoyuan City, Taiwan (China); Department of Neurological Surgery, Chang Gung Memorial Hospital, Guishan Dist., Taoyuan City, Taiwan (China)

    2017-03-15

    Convulxin (CVX), a C-type lectin-like protein (CLPs), is a potent platelet aggregation inducer. To evaluate its potential applications in angiogenic diseases, the multimeric CVX were further explored on its mode of actions toward human coronary artery smooth muscle cells (HCASMCs). The N-terminus of β-chain of CVX (CVX-β) contains a putative disintegrin-like domain with a conserved motif upon the sequence comparison with other CLPs. Importantly, native CVX had no cytotoxic activity as examined by electrophoretic pattern. A Trp-Ala–Asp (WAD)-containing octapeptide, MTWADAEK, was thereafter synthesized and analyzed in functional assays. In the case of specific integrin antagonists as positive controls, the anti-angiogenic effects of CVX on HCASMCs were investigated by series of functional analyses. CVX showed to exhibit multiple inhibitory activities toward HCASMCs proliferation, adhesion and invasion with a dose- and integrin αvβ3-dependent fashion. However, the WAD-octapeptide exerting a minor potency could also work as an active peptidomimetic. In addition, flow cytometric analysis demonstrated both the intact CVX and synthetic peptide can specifically interact with integrin-αv on HCASMCs and CVX was shown to have a down-regulatory effect on the gene expression of CXC-chemokines, such as growth-related oncogene and interleukin-8. According to nuclear factor-κB (NF-κB) p65 translocation assay and Western blotting analysis, the NF-κB activation was not involved in the signaling events of CVX-induced gene expression. In conclusion, CVX may act as a disintegrin-like protein via the interactions of WAD-motif in CVX-β with integrin-αv on HCASMCs and it also is a gene suppressor with the ability to diminish the expression of two CXC-chemokines in a NF-κB-independent manner. Indeed, more extensive investigations are needed and might create a new avenue for the development of a novel angiostatic agent. - Highlights: • The tetrameric convulxin (CVX) with WAD-motif

  8. Convulxin, a C-type lectin-like protein, inhibits HCASMCs functions via WAD-motif/integrin-αv interaction and NF-κB-independent gene suppression of GRO and IL-8

    International Nuclear Information System (INIS)

    Shih, Chun-Ho; Chiang, Tin-Bin; Wang, Wen-Jeng

    2017-01-01

    Convulxin (CVX), a C-type lectin-like protein (CLPs), is a potent platelet aggregation inducer. To evaluate its potential applications in angiogenic diseases, the multimeric CVX were further explored on its mode of actions toward human coronary artery smooth muscle cells (HCASMCs). The N-terminus of β-chain of CVX (CVX-β) contains a putative disintegrin-like domain with a conserved motif upon the sequence comparison with other CLPs. Importantly, native CVX had no cytotoxic activity as examined by electrophoretic pattern. A Trp-Ala–Asp (WAD)-containing octapeptide, MTWADAEK, was thereafter synthesized and analyzed in functional assays. In the case of specific integrin antagonists as positive controls, the anti-angiogenic effects of CVX on HCASMCs were investigated by series of functional analyses. CVX showed to exhibit multiple inhibitory activities toward HCASMCs proliferation, adhesion and invasion with a dose- and integrin αvβ3-dependent fashion. However, the WAD-octapeptide exerting a minor potency could also work as an active peptidomimetic. In addition, flow cytometric analysis demonstrated both the intact CVX and synthetic peptide can specifically interact with integrin-αv on HCASMCs and CVX was shown to have a down-regulatory effect on the gene expression of CXC-chemokines, such as growth-related oncogene and interleukin-8. According to nuclear factor-κB (NF-κB) p65 translocation assay and Western blotting analysis, the NF-κB activation was not involved in the signaling events of CVX-induced gene expression. In conclusion, CVX may act as a disintegrin-like protein via the interactions of WAD-motif in CVX-β with integrin-αv on HCASMCs and it also is a gene suppressor with the ability to diminish the expression of two CXC-chemokines in a NF-κB-independent manner. Indeed, more extensive investigations are needed and might create a new avenue for the development of a novel angiostatic agent. - Highlights: • The tetrameric convulxin (CVX) with WAD-motif

  9. Filling the gap between sequence and function: a bioinformatics approach

    NARCIS (Netherlands)

    Bargsten, J.W.

    2014-01-01

    The research presented in this thesis focuses on deriving function from sequence information, with the emphasis on plant sequence data. Unravelling the impact of genomic elements, in most cases genes, on the phenotype of an organism is a major challenge in biological research and modern plant

  10. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Science.gov (United States)

    Fauteux, François; Strömvik, Martina V

    2009-01-01

    Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

  11. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Directory of Open Access Journals (Sweden)

    Fauteux François

    2009-10-01

    Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination

  12. Zero sequences of holomorphic functions, representation of meromorphic functions. II. Entire functions

    International Nuclear Information System (INIS)

    Khabibullin, Bulat N

    2009-01-01

    Let Λ={λ k } be a sequence of points in the complex plane C and f a non-trivial entire function of finite order ρ and finite type σ such that f=0 on Λ. Upper bounds for functions such as the Weierstrass-Hadamard canonical product of order ρ constructed from the sequence Λ are obtained. Similar bounds for meromorphic functions are also derived. These results are used to estimate the radius of completeness of a system of exponentials in C. Bibliography: 26 titles.

  13. Fast social-like learning of complex behaviors based on motor motifs

    Science.gov (United States)

    Calvo Tapia, Carlos; Tyukin, Ivan Y.; Makarov, Valeri A.

    2018-05-01

    Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n -1 )! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n -1 ) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.

  14. Arithmetic convergent sequence space defined by modulus function

    Directory of Open Access Journals (Sweden)

    Taja Yaying

    2019-10-01

    Full Text Available The aim of this article is to introduce the sequence spaces $AC(f$ and $AS(f$ using arithmetic convergence and modulus function, and study algebraic and topological properties of this space, and certain inclusion results.

  15. Structural and functional studies of a phosphatidic acid-binding antifungal plant defensin MtDef4: Identification of an RGFRRR motif governing fungal cell entry

    Energy Technology Data Exchange (ETDEWEB)

    Sagaram, Uma S.; El-Mounadi, Kaoutar; Buchko, Garry W.; Berg, Howard R.; Kaur, Jagdeep; Pandurangi, Raghoottama; Smith, Thomas J.; Shah, Dilip

    2013-12-04

    A highly conserved plant defensin MtDef4 potently inhibits the growth of a filamentous fungus Fusarium graminearum. MtDef4 is internalized by cells of F. graminearum. To determine its mechanism of fungal cell entry and antifungal action, NMR solution structure of MtDef4 has been determined. The analysis of its structure has revealed a positively charged patch on the surface of the protein consisting of arginine residues in its γ-core signature, a major determinant of the antifungal activity of MtDef4. Here, we report functional analysis of the RGFRRR motif of the γ-core signature of MtDef4. The replacement of RGFRRR to AAAARR or to RGFRAA not only abolishes fungal cell entry but also results in loss of the antifungal activity of MtDef4. MtDef4 binds strongly to phosphatidic acid (PA), a precursor for the biosynthesis of membrane phospholipids and a signaling lipid known to recruit cytosolic proteins to membranes. Mutations of RGFRRR which abolish fungal cell entry of MtDef4 also impair its binding to PA. Our results suggest that RGFRRR motif is a translocation signal for entry of MtDef4 into fungal cells and that this positively charged motif likely mediates interaction of this defensin with PA as part of its antifungal action.

  16. Temporal motifs in time-dependent networks

    International Nuclear Information System (INIS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-01-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological–temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network

  17. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

    Science.gov (United States)

    Quang, Daniel; Xie, Xiaohui

    2016-06-20

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. The convergence of the order sequence and the solution function sequence on fractional partial differential equation

    Science.gov (United States)

    Rusyaman, E.; Parmikanti, K.; Chaerani, D.; Asefan; Irianingsih, I.

    2018-03-01

    One of the application of fractional ordinary differential equation is related to the viscoelasticity, i.e., a correlation between the viscosity of fluids and the elasticity of solids. If the solution function develops into function with two or more variables, then its differential equation must be changed into fractional partial differential equation. As the preliminary study for two variables viscoelasticity problem, this paper discusses about convergence analysis of function sequence which is the solution of the homogenous fractional partial differential equation. The method used to solve the problem is Homotopy Analysis Method. The results show that if given two real number sequences (αn) and (βn) which converge to α and β respectively, then the solution function sequences of fractional partial differential equation with order (αn, βn) will also converge to the solution function of fractional partial differential equation with order (α, β).

  19. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  20. Inaudible functional MRI using a truly mute gradient echo sequence

    International Nuclear Information System (INIS)

    Marcar, V.L.; Girard, F.; Rinkel, Y.; Schneider, J.F.; Martin, E.

    2002-01-01

    We performed functional MRI experiments using a mute version of a gradient echo sequence on adult volunteers using either a simple visual stimulus (flicker goggles: 4 subjects) or an auditory stimulus (music: 4 subjects). Because the mute sequence delivers fewer images per unit time than a fast echo planar imaging (EPI) sequence, we explored our data using a parametric ANOVA test and a non-parametric Wilcoxon-Mann-Whitney test in addition to performing a cross-correlation analysis. All three methods were in close agreement regarding the location of the BOLD contrast signal change. We demonstrated that, using appropriate statistical analysis, functional MRI using an MR sequence that is acoustically inaudible to the subject is feasible. Furthermore compared with the ''silent'' event-related procedures involving an EPI protocol, our mGE protocol compares favourably with respect to experiment time and the BOLD signal. (orig.)

  1. Inaudible functional MRI using a truly mute gradient echo sequence

    Energy Technology Data Exchange (ETDEWEB)

    Marcar, V.L. [University of Zurich, Department of Psychology, Neuropsychology, Treichlerstrasse 10, 8032 Zurich (Switzerland); Girard, F. [GE Medical Systems SA, 283, rue de la Miniere B.P. 34, 78533 Buc Cedex (France); Rinkel, Y.; Schneider, J.F.; Martin, E. [University Children' s Hospital, Neuroradiology and Magnetic Resonance, Department of Diagnostic Imaging, Steinwiesstrasse 75, 8032 Zurich (Switzerland)

    2002-11-01

    We performed functional MRI experiments using a mute version of a gradient echo sequence on adult volunteers using either a simple visual stimulus (flicker goggles: 4 subjects) or an auditory stimulus (music: 4 subjects). Because the mute sequence delivers fewer images per unit time than a fast echo planar imaging (EPI) sequence, we explored our data using a parametric ANOVA test and a non-parametric Wilcoxon-Mann-Whitney test in addition to performing a cross-correlation analysis. All three methods were in close agreement regarding the location of the BOLD contrast signal change. We demonstrated that, using appropriate statistical analysis, functional MRI using an MR sequence that is acoustically inaudible to the subject is feasible. Furthermore compared with the ''silent'' event-related procedures involving an EPI protocol, our mGE protocol compares favourably with respect to experiment time and the BOLD signal. (orig.)

  2. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    Science.gov (United States)

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  3. The MHC motif viewer: a visualization tool for MHC binding motifs

    DEFF Research Database (Denmark)

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole

    2010-01-01

    is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  4. Identification of a putative nuclear export signal motif in human NANOG homeobox domain

    International Nuclear Information System (INIS)

    Park, Sung-Won; Do, Hyun-Jin; Huh, Sun-Hyung; Sung, Boreum; Uhm, Sang-Jun; Song, Hyuk; Kim, Nam-Hyung; Kim, Jae-Hwan

    2012-01-01

    Highlights: ► We found the putative nuclear export signal motif within human NANOG homeodomain. ► Leucine-rich residues are important for human NANOG homeodomain nuclear export. ► CRM1-specific inhibitor LMB blocked the potent human NANOG NES-mediated nuclear export. -- Abstract: NANOG is a homeobox-containing transcription factor that plays an important role in pluripotent stem cells and tumorigenic cells. To understand how nuclear localization of human NANOG is regulated, the NANOG sequence was examined and a leucine-rich nuclear export signal (NES) motif ( 125 MQELSNILNL 134 ) was found in the homeodomain (HD). To functionally validate the putative NES motif, deletion and site-directed mutants were fused to an EGFP expression vector and transfected into COS-7 cells, and the localization of the proteins was examined. While hNANOG HD exclusively localized to the nucleus, a mutant with both NLSs deleted and only the putative NES motif contained (hNANOG HD-ΔNLSs) was predominantly cytoplasmic, as observed by nucleo/cytoplasmic fractionation and Western blot analysis as well as confocal microscopy. Furthermore, site-directed mutagenesis of the putative NES motif in a partial hNANOG HD only containing either one of the two NLS motifs led to localization in the nucleus, suggesting that the NES motif may play a functional role in nuclear export. Furthermore, CRM1-specific nuclear export inhibitor LMB blocked the hNANOG potent NES-mediated export, suggesting that the leucine-rich motif may function in CRM1-mediated nuclear export of hNANOG. Collectively, a NES motif is present in the hNANOG HD and may be functionally involved in CRM1-mediated nuclear export pathway.

  5. Functional annotation from the genome sequence of the giant panda

    OpenAIRE

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-01-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided in...

  6. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    Energy Technology Data Exchange (ETDEWEB)

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  7. Massively parallel interrogation of aptamer sequence, structure and function.

    Directory of Open Access Journals (Sweden)

    Nicholas O Fischer

    Full Text Available BACKGROUND: Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. METHODOLOGY/PRINCIPAL FINDINGS: High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. CONCLUSION AND SIGNIFICANCE: The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  8. Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans.

    Directory of Open Access Journals (Sweden)

    Sourav Roy

    2013-03-01

    Full Text Available Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures. Most of the putative stage-specific transcription factor binding sites (TFBSs thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors.

  9. The valine and lysine residues in the conserved FxVTxK motif are important for the function of phylogenetically distant plant cellulose synthases

    Energy Technology Data Exchange (ETDEWEB)

    Slabaugh, Erin; Scavuzzo-Duggan, Tess; Chaves, Arielle; Wilson, Liza; Wilson, Carmen; Davis, Jonathan K.; Cosgrove, Daniel J.; Anderson, Charles T.; Roberts, Alison W.; Haigler, Candace H.

    2015-12-08

    Cellulose synthases (CESAs) synthesize the β-1,4-glucan chains that coalesce to form cellulose microfibrils in plant cell walls. In addition to a large cytosolic (catalytic) domain, CESAs have eight predicted transmembrane helices (TMHs). However, analogous to the structure of BcsA, a bacterial CESA, predicted TMH5 in CESA may instead be an interfacial helix. This would place the conserved FxVTxK motif in the plant cell cytosol where it could function as a substrate-gating loop as occurs in BcsA. To define the functional importance of the CESA region containing FxVTxK, we tested five parallel mutations in Arabidopsis thaliana CESA1 and Physcomitrella patens CESA5 in complementation assays of the relevant cesa mutants. In both organisms, the substitution of the valine or lysine residues in FxVTxK severely affected CESA function. In Arabidopsis roots, both changes were correlated with lower cellulose anisotropy, as revealed by Pontamine Fast Scarlet. Analysis of hypocotyl inner cell wall layers by atomic force microscopy showed that two altered versions of Atcesa1 could rescue cell wall phenotypes observed in the mutant background line. Overall, the data show that the FxVTxK motif is functionally important in two phylogenetically distant plant CESAs. The results show that Physcomitrella provides an efficient model for assessing the effects of engineered CESA mutations affecting primary cell wall synthesis and that diverse testing systems can lead to nuanced insights into CESA structure–function relationships. Although CESA membrane topology needs to be experimentally determined, the results support the possibility that the FxVTxK region functions similarly in CESA and BcsA.

  10. RegRNA: an integrated web server for identifying regulatory RNA motifs and elements

    OpenAIRE

    Huang, Hsi-Yuan; Chien, Chia-Hung; Jen, Kuan-Hua; Huang, Hsien-Da

    2006-01-01

    Numerous regulatory structural motifs have been identified as playing essential roles in transcriptional and post-transcriptional regulation of gene expression. RegRNA is an integrated web server for identifying the homologs of regulatory RNA motifs and elements against an input mRNA sequence. Both sequence homologs and structural homologs of regulatory RNA motifs can be recognized. The regulatory RNA motifs supported in RegRNA are categorized into several classes: (i) motifs in mRNA 5′-untra...

  11. Functional and structural analysis of the DNA sequence conferring glucocorticoid inducibility to the mouse mammary tumor virus gene

    International Nuclear Information System (INIS)

    Skroch, P.

    1987-05-01

    In the first part of my thesis I show that the DNA element conferring glucocorticoid inducibility to the Mouse Mammary Tumor Virus (HRE) has enhancer properties. It activates a heterologous promoter - that of the β-globin gene, independently of distance, position and orientation. These properties however have to be regarded in relation to the remaining regulatory elements of the activated gene as the recombinants between HRE and the TK gene have demonstrated. In the second part of my thesis I investigated the biological significance of certain sequence motifs of the HRE, which are remarkable by their interaction with transacting factors or sequence homologies with other regulatory DNA elements. I could confirm the generally postulated modular structure of enhancers for the HRE and bring the relevance of the single subdomains for the function of the element into relationship. (orig.) [de

  12. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  13. Conserved binding of GCAC motifs by MEC-8, couch potato, and the RBPMS protein family

    Science.gov (United States)

    Soufari, Heddy

    2017-01-01

    Precise regulation of mRNA processing, translation, localization, and stability relies on specific interactions with RNA-binding proteins whose biological function and target preference are dictated by their preferred RNA motifs. The RBPMS family of RNA-binding proteins is defined by a conserved RNA recognition motif (RRM) domain found in metazoan RBPMS/Hermes and RBPMS2, Drosophila couch potato, and MEC-8 from Caenorhabditis elegans. In order to determine the parameters of RNA sequence recognition by the RBPMS family, we have first used the N-terminal domain from MEC-8 in binding assays and have demonstrated a preference for two GCAC motifs optimally separated by >6 nucleotides (nt). We have also determined the crystal structure of the dimeric N-terminal RRM domain from MEC-8 in the unbound form, and in complex with an oligonucleotide harboring two copies of the optimal GCAC motif. The atomic details reveal the molecular network that provides specificity to all four bases in the motif, including multiple hydrogen bonds to the initial guanine. Further studies with human RBPMS, as well as Drosophila couch potato, confirm a general preference for this double GCAC motif by other members of the protein family and the presence of this motif in known targets. PMID:28003515

  14. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Science.gov (United States)

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  15. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Directory of Open Access Journals (Sweden)

    Pooya Zandevakili

    Full Text Available Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  16. Two functional motifs define the interaction, internalization and toxicity of the cell-penetrating antifungal peptide PAF26 on fungal cells.

    Directory of Open Access Journals (Sweden)

    Alberto Muñoz

    Full Text Available The synthetic, cell penetrating hexapeptide PAF26 (RKKWFW is antifungal at low micromolar concentrations and has been proposed as a model for cationic, cell-penetrating antifungal peptides. Its short amino acid sequence facilitates the analysis of its structure-activity relationships using the fungal models Neurospora crassa and Saccharomyces cerevisiae, and human and plant pathogens Aspergillus fumigatus and Penicillium digitatum, respectively. Previously, PAF26 at low fungicidal concentrations was shown to be endocytically internalized, accumulated in vacuoles and then actively transported into the cytoplasm where it exerts its antifungal activity. In the present study, two PAF26 derivatives, PAF95 (AAAWFW and PAF96 (RKKAAA, were designed to characterize the roles of the N-terminal cationic and the C-terminal hydrophobic motifs in PAF26's mode-of-action. PAF95 and PAF96 exhibited substantially reduced antifungal activity against all the fungi analyzed. PAF96 localized to fungal cell envelopes and was not internalized by the fungi. In contrast, PAF95 was taken up into vacuoles of N. crassa, wherein it accumulated and was trapped without toxic effects. Also, the PAF26 resistant Δarg1 strain of S. cerevisiae exhibited increased PAF26 accumulation in vacuoles. Live-cell imaging of GFP-labelled nuclei in A. fumigatus showed that transport of PAF26 from the vacuole to the cytoplasm was followed by nuclear breakdown and dissolution. This work demonstrates that the amphipathic PAF26 possesses two distinct motifs that allow three stages in its antifungal action to be defined: (i its interaction with the cell envelope; (ii its internalization and transport to vacuoles mediated by the aromatic hydrophobic domain; and (iii its transport from vacuoles to the cytoplasm. Significantly, cationic residues in PAF26 are important not only for the electrostatic attraction and interaction with the fungal cell but also for transport from the vacuole to the

  17. deFUME: Dynamic exploration of functional metagenomic sequencing data

    DEFF Research Database (Denmark)

    van der Helm, Eric; Geertz-Hansen, Henrik Marcus; Genee, Hans Jasper

    2015-01-01

    is time consuming and constitutes a major bottleneck for experimental researchers in the field. Here we present the deFUME web server, an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, tailored to meet the requirements of non......-bioinformaticians. The web-server integrates multiple analysis steps into one single workflow: read assembly, open reading frame prediction, and annotation with BLAST, InterPro and GO classifiers. Analysis results are visualized in an online dynamic web-interface. The deFUME webserver provides a fast track from raw sequence...

  18. Mining dynamic noteworthy functions in software execution sequences.

    Science.gov (United States)

    Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.

  19. Multiple TPR motifs characterize the Fanconi anemia FANCG protein.

    Science.gov (United States)

    Blom, Eric; van de Vrugt, Henri J; de Vries, Yne; de Winter, Johan P; Arwert, Fré; Joenje, Hans

    2004-01-05

    The genome protection pathway that is defective in patients with Fanconi anemia (FA) is controlled by at least eight genes, including BRCA2. A key step in the pathway involves the monoubiquitylation of FANCD2, which critically depends on a multi-subunit nuclear 'core complex' of at least six FANC proteins (FANCA, -C, -E, -F, -G, and -L). Except for FANCL, which has WD40 repeats and a RING finger domain, no significant domain structure has so far been recognized in any of the core complex proteins. By using a homology search strategy comparing the human FANCG protein sequence with its ortholog sequences in Oryzias latipes (Japanese rice fish) and Danio rerio (zebrafish) we identified at least seven tetratricopeptide repeat motifs (TPRs) covering a major part of this protein. TPRs are degenerate 34-amino acid repeat motifs which function as scaffolds mediating protein-protein interactions, often found in multiprotein complexes. In four out of five TPR motifs tested (TPR1, -2, -5, and -6), targeted missense mutagenesis disrupting the motifs at the critical position 8 of each TPR caused complete or partial loss of FANCG function. Loss of function was evident from failure of the mutant proteins to complement the cellular FA phenotype in FA-G lymphoblasts, which was correlated with loss of binding to FANCA. Although the TPR4 mutant fully complemented the cells, it showed a reduced interaction with FANCA, suggesting that this TPR may also be of functional importance. The recognition of FANCG as a typical TPR protein predicts this protein to play a key role in the assembly and/or stabilization of the nuclear FA protein core complex.

  20. Conservation patterns in different functional sequence categoriesof divergent Drosophila species

    Energy Technology Data Exchange (ETDEWEB)

    Papatsenko, Dmitri; Kislyuk, Andrey; Levine, Michael; Dubchak, Inna

    2005-10-01

    We have explored the distributions of fully conservedungapped blocks in genome-wide pairwise alignments of recently completedspecies of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilisand D.mojavensis. Based on these distributions we have found that nearlyevery functional sequence category possesses its own distinctiveconservation pattern, sometimes independent of the overall sequenceconservation level. In the coding and regulatory regions, the ungappedblocks were longer than in introns, UTRs and non-functional sequences. Atthe same time, the blocks in the coding regions carried 3N+2 signaturecharacteristic to synonymic substitutions in the 3rd codon positions.Larger block sizes in transcription regulatory regions can be explainedby the presence of conserved arrays of binding sites for transcriptionfactors. We also have shown that the longest ungapped blocks, or'ultraconserved' sequences, are associated with specific gene groups,including those encoding ion channels and components of the cytoskeleton.We discussed how restrained conservation patterns may help in mappingfunctional sequence categories and improving genomeannotation.

  1. Comprehensive and Facile Synthesis of Some Functionalized Bis-Heterocyclic Compounds Containing a Thieno[2,3-b]thiophene Motif

    Science.gov (United States)

    Mabkhot, Yahia N.; Barakat, Assem; Al-Majid, Abdullah M.; Alshahrani, Saeed A.

    2012-01-01

    A comprehensive and facile method for the synthesis of new functionalized bis-heterocyclic compounds containing a thieno[2,3-b]thiophene motif is described. The hitherto unknown bis-pyrazolothieno[2,3-b]thiophene derivatives 2a–c, bis-pyridazin othieno[2,3-b]thiophene derivatives 4, bis-pyridinothieno[2,3-b]thiophene derivatives 6a,b, and to an analogous bis-pyridinothieno[2,3-b]thiophene nitrile derivatives 7 are obtained. Additionally, the novel bis-pyradazinonothieno[2,3-b]thiophene derivatives 9, and nicotinic acid derivatives 10, 11 are obtained via bis-dienamide 8. The structures of all newly synthesized compounds have been elucidated by 1H, 13C NMR, GCMS, and IR spectrometry. These compounds represent a new class of sulfur and Nitrogen containing heterocycles that should also be of interest as new materials. PMID:22408452

  2. Sequential immunization with V3 peptides from primary human immunodeficiency virus type 1 produces cross-neutralizing antibodies against primary isolates with a matching narrow-neutralization sequence motif.

    Science.gov (United States)

    Eda, Yasuyuki; Takizawa, Mari; Murakami, Toshio; Maeda, Hiroaki; Kimachi, Kazuhiko; Yonemura, Hiroshi; Koyanagi, Satoshi; Shiosaki, Kouichi; Higuchi, Hirofumi; Makizumi, Keiichi; Nakashima, Toshihiro; Osatomi, Kiyoshi; Tokiyoshi, Sachio; Matsushita, Shuzo; Yamamoto, Naoki; Honda, Mitsuo

    2006-06-01

    An antibody response capable of neutralizing not only homologous but also heterologous forms of the CXCR4-tropic human immunodeficiency virus type 1 (HIV-1) MNp and CCR5-tropic primary isolate HIV-1 JR-CSF was achieved through sequential immunization with a combination of synthetic peptides representing HIV-1 Env V3 sequences from field and laboratory HIV-1 clade B isolates. In contrast, repeated immunization with a single V3 peptide generated antibodies that neutralized only type-specific laboratory-adapted homologous viruses. To determine whether the cross-neutralization response could be attributed to a cross-reactive antibody in the immunized animals, we isolated a monoclonal antibody, C25, which neutralized the heterologous primary viruses of HIV-1 clade B. Furthermore, we generated a humanized monoclonal antibody, KD-247, by transferring the genes of the complementary determining region of C25 into genes of the human V region of the antibody. KD-247 bound with high affinity to the "PGR" motif within the HIV-1 Env V3 tip region, and, among the established reference antibodies, it most effectively neutralized primary HIV-1 field isolates possessing the matching neutralization sequence motif, suggesting its promise for clinical applications involving passive immunizations. These results demonstrate that sequential immunization with B-cell epitope peptides may contribute to a humoral immune-based HIV vaccine strategy. Indeed, they help lay the groundwork for the development of HIV-1 vaccine strategies that use sequential immunization with biologically relevant peptides to overcome difficulties associated with otherwise poorly immunogenic epitopes.

  3. Proline: the distribution, frequency, positioning, and common functional roles of proline and polyproline sequences in the human proteome.

    Directory of Open Access Journals (Sweden)

    Alexander A Morgan

    Full Text Available Proline is an anomalous amino acid. Its nitrogen atom is covalently locked within a ring, thus it is the only proteinogenic amino acid with a constrained phi angle. Sequences of three consecutive prolines can fold into polyproline helices, structures that join alpha helices and beta pleats as architectural motifs in protein configuration. Triproline helices are participants in protein-protein signaling interactions. Longer spans of repeat prolines also occur, containing as many as 27 consecutive proline residues. Little is known about the frequency, positioning, and functional significance of these proline sequences. Therefore we have undertaken a systematic bioinformatics study of proline residues in proteins. We analyzed the distribution and frequency of 687,434 proline residues among 18,666 human proteins, identifying single residues, dimers, trimers, and longer repeats. Proline accounts for 6.3% of the 10,882,808 protein amino acids. Of all proline residues, 4.4% are in trimers or longer spans. We detected patterns that influence function based on proline location, spacing, and concentration. We propose a classification based on proline-rich, polyproline-rich, and proline-poor status. Whereas singlet proline residues are often found in proteins that display recurring architectural patterns, trimers or longer proline sequences tend be associated with the absence of repetitive structural motifs. Spans of 6 or more are associated with DNA/RNA processing, actin, and developmental processes. We also suggest a role for proline in Kruppel-type zinc finger protein control of DNA expression, and in the nucleation and translocation of actin by the formin complex.

  4. MotifMark: Finding Regulatory Motifs in DNA Sequences

    OpenAIRE

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L.; Wang, May D.

    2017-01-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity be...

  5. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome

    Directory of Open Access Journals (Sweden)

    Santosh K. Tiwari

    2011-01-01

    Full Text Available The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs, in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0 software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5 software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  6. Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes

    Directory of Open Access Journals (Sweden)

    Kistler Corby

    2010-03-01

    Full Text Available Abstract Background Fusarium graminearum (Fg, a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study. Results Applying comparative genomics approaches, we developed a computational pipeline to systematically discover evolutionarily conserved regulatory motifs in the promoter, downstream and the intronic regions of Fg genes, based on the multiple alignments of sequenced Fusarium genomes. Using this method, we discovered 73 candidate regulatory motifs in the promoter regions. Nearly 30% of these motifs are highly enriched in promoter regions of Fg genes that are associated with a specific functional category. Through comparison to Saccharomyces cerevisiae (Sc and Schizosaccharomyces pombe (Sp, we observed conservation of transcription factors (TFs, their binding sites and the target genes regulated by these TFs related to pathways known to respond to stress conditions or phosphate metabolism. In addition, this study revealed 69 and 39 conserved motifs in the downstream regions and the intronic regions, respectively, of Fg genes. The top intronic motif is the splice donor site. For the downstream regions, we noticed an intriguing absence of the mammalian and Sc poly-adenylation signals among the list of conserved motifs. Conclusion This study provides the first comprehensive list of candidate regulatory motifs in Fg, and underscores the power of comparative genomics in revealing functional elements among related genomes. The conservation of regulatory pathways among the Fusarium genomes and the two yeast species reveals their functional significance, and provides new insights in their

  7. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    Science.gov (United States)

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  8. Modulating p56Lck in T-Cells by a Chimeric Peptide Comprising Two Functionally Different Motifs of Tip from Herpesvirus saimiri

    Directory of Open Access Journals (Sweden)

    Jean-Paul Vernot

    2015-01-01

    Full Text Available The Lck interacting protein Tip of Herpesvirus saimiri is responsible for T-cell transformation both in vitro and in vivo. Here we designed the chimeric peptide hTip-CSKH, comprising the Lck specific interacting motif CSKH of Tip and its hydrophobic transmembrane sequence (hTip, the latter as a vector targeting lipid rafts. We found that hTip-CSKH can induce a fivefold increase in proliferation of human and Aotus sp. T-cells. Costimulation with PMA did not enhance this proliferation rate, suggesting that hTip-CSKH is sufficient and independent of further PKC stimulation. We also found that human Lck phosphorylation was increased earlier after stimulation when T-cells were incubated previously with hTip-CSKH, supporting a strong signalling and proliferative effect of the chimeric peptide. Additionally, Lck downstream signalling was evident with hTip-CSKH but not with control peptides. Importantly, hTip-CSKH could be identified in heavy lipid rafts membrane fractions, a compartment where important T-cell signalling molecules (LAT, Ras, and Lck are present during T-cell activation. Interestingly, hTip-CSKH was inhibitory to Jurkat cells, in total agreement with the different signalling pathways and activation requirements of this leukemic cell line. These results provide the basis for the development of new compounds capable of modulating therapeutic targets present in lipid rafts.

  9. Modulating p56Lck in T-Cells by a Chimeric Peptide Comprising Two Functionally Different Motifs of Tip from Herpesvirus saimiri.

    Science.gov (United States)

    Vernot, Jean-Paul; Perdomo-Arciniegas, Ana María; Pérez-Quintero, Luis Alberto; Martínez, Diego Fernando

    2015-01-01

    The Lck interacting protein Tip of Herpesvirus saimiri is responsible for T-cell transformation both in vitro and in vivo. Here we designed the chimeric peptide hTip-CSKH, comprising the Lck specific interacting motif CSKH of Tip and its hydrophobic transmembrane sequence (hTip), the latter as a vector targeting lipid rafts. We found that hTip-CSKH can induce a fivefold increase in proliferation of human and Aotus sp. T-cells. Costimulation with PMA did not enhance this proliferation rate, suggesting that hTip-CSKH is sufficient and independent of further PKC stimulation. We also found that human Lck phosphorylation was increased earlier after stimulation when T-cells were incubated previously with hTip-CSKH, supporting a strong signalling and proliferative effect of the chimeric peptide. Additionally, Lck downstream signalling was evident with hTip-CSKH but not with control peptides. Importantly, hTip-CSKH could be identified in heavy lipid rafts membrane fractions, a compartment where important T-cell signalling molecules (LAT, Ras, and Lck) are present during T-cell activation. Interestingly, hTip-CSKH was inhibitory to Jurkat cells, in total agreement with the different signalling pathways and activation requirements of this leukemic cell line. These results provide the basis for the development of new compounds capable of modulating therapeutic targets present in lipid rafts.

  10. Import of desired nucleic acid sequences using addressing motif of mitochondrial ribosomal 5S-rRNA for fluorescent in vivo hybridization of mitochondrial DNA and RNA.

    Science.gov (United States)

    Zelenka, Jaroslav; Alán, Lukáš; Jabůrek, Martin; Ježek, Petr

    2014-04-01

    Based on the matrix-addressing sequence of mitochondrial ribosomal 5S-rRNA (termed MAM), which is naturally imported into mitochondria, we have constructed an import system for in vivo targeting of mitochondrial DNA (mtDNA) or mt-mRNA, in order to provide fluorescence hybridization of the desired sequences. Thus DNA oligonucleotides were constructed, containing the 5'-flanked T7 RNA polymerase promoter. After in vitro transcription and fluorescent labeling with Alexa Fluor(®) 488 or 647 dye, we obtained the fluorescent "L-ND5 probe" containing MAM and exemplar cargo, i.e., annealing sequence to a short portion of ND5 mRNA and to the light-strand mtDNA complementary to the heavy strand nd5 mt gene (5'-end 21 base pair sequence). For mitochondrial in vivo fluorescent hybridization, HepG2 cells were treated with dequalinium micelles, containing the fluorescent probes, bringing the probes proximally to the mitochondrial outer membrane and to the natural import system. A verification of import into the mitochondrial matrix of cultured HepG2 cells was provided by confocal microscopy colocalizations. Transfections using lipofectamine or probes without 5S-rRNA addressing MAM sequence or with MAM only were ineffective. Alternatively, the same DNA oligonucleotides with 5'-CACC overhang (substituting T7 promoter) were transcribed from the tetracycline-inducible pENTRH1/TO vector in human embryonic kidney T-REx®-293 cells, while mitochondrial matrix localization after import of the resulting unlabeled RNA was detected by PCR. The MAM-containing probe was then enriched by three-order of magnitude over the natural ND5 mRNA in the mitochondrial matrix. In conclusion, we present a proof-of-principle for mitochondrial in vivo hybridization and mitochondrial nucleic acid import.

  11. The structure of Plasmodium vivax phosphatidylethanolamine-binding protein suggests a functional motif containing a left-handed helix

    International Nuclear Information System (INIS)

    Arakaki, Tracy; Neely, Helen; Boni, Erica; Mueller, Natasha; Buckner, Frederick S.; Van Voorhis, Wesley C.; Lauricella, Angela; DeTitta, George; Luft, Joseph; Hol, Wim G. J.; Merritt, Ethan A.

    2007-01-01

    The crystal structure of a phosphatidylethanolamine-binding protein from P. vivax, a homolog of Raf-kinase inhibitor protein (RKIP), has been solved to a resolution of 1.3 Å. The inferred interaction surface near the anion-binding site is found to include a distinctive left-handed α-helix. The structure of a putative Raf kinase inhibitor protein (RKIP) homolog from the eukaryotic parasite Plasmodium vivax has been studied to a resolution of 1.3 Å using multiple-wavelength anomalous diffraction at the Se K edge. This protozoan protein is topologically similar to previously studied members of the phosphatidylethanolamine-binding protein (PEBP) sequence family, but exhibits a distinctive left-handed α-helical region at one side of the canonical phospholipid-binding site. Re-examination of previously determined PEBP structures suggests that the P. vivax protein and yeast carboxypeptidase Y inhibitor may represent a structurally distinct subfamily of the diverse PEBP-sequence family

  12. Structural and functional studies of a phosphatidic acid-binding antifungal plant defensin MtDef4: identification of an RGFRRR motif governing fungal cell entry.

    Directory of Open Access Journals (Sweden)

    Uma Shankar Sagaram

    Full Text Available MtDef4 is a 47-amino acid cysteine-rich evolutionary conserved defensin from a model legume Medicago truncatula. It is an apoplast-localized plant defense protein that inhibits the growth of the ascomycetous fungal pathogen Fusarium graminearum in vitro at micromolar concentrations. Little is known about the mechanisms by which MtDef4 mediates its antifungal activity. In this study, we show that MtDef4 rapidly permeabilizes fungal plasma membrane and is internalized by the fungal cells where it accumulates in the cytoplasm. Furthermore, analysis of the structure of MtDef4 reveals the presence of a positively charged γ-core motif composed of β2 and β3 strands connected by a positively charged RGFRRR loop. Replacement of the RGFRRR sequence with AAAARR or RGFRAA abolishes the ability of MtDef4 to enter fungal cells, suggesting that the RGFRRR loop is a translocation signal required for the internalization of the protein. MtDef4 binds to phosphatidic acid (PA, a precursor for the biosynthesis of membrane phospholipids and a signaling lipid known to recruit cytosolic proteins to membranes. Amino acid substitutions in the RGFRRR sequence which abolish the ability of MtDef4 to enter fungal cells also impair its ability to bind PA. These findings suggest that MtDef4 is a novel antifungal plant defensin capable of entering into fungal cells and affecting intracellular targets and that these processes are mediated by the highly conserved cationic RGFRRR loop via its interaction with PA.

  13. Sequence-based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families.

    Directory of Open Access Journals (Sweden)

    Janine Maimanakos

    2016-08-01

    Full Text Available Arylmalonate-Decarboxylases (AMDases, EC 4.1.1.76 are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta- and Gammaproteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the TTT family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99% of the (R-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes.

  14. Discriminative motif discovery via simulated evolution and random under-sampling.

    Directory of Open Access Journals (Sweden)

    Tao Song

    Full Text Available Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  15. Discriminative motif discovery via simulated evolution and random under-sampling.

    Science.gov (United States)

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  16. Asymptotically double lacunry equivalent sequences defined by Orlicz functions

    Directory of Open Access Journals (Sweden)

    Ayhan Esi

    2014-04-01

    Full Text Available This paper presents the following definition which is natural combition of the definition for asymptotically equivalent and Orlicz function. The two nonnegative double sequences x=(x_{k,l} and y=(y_{k,l} are said to be M-asymptotically double equivalent to multiple L provided that for every ε>0, P-lim_{k,l}M(((|((x_{k,l}/(y_{k,l}-L|/ρ=0, for some ρ>0, (denoted by x∽y and simply M-asymptotically double equivalent if L=1. Also we give some new concepts related to this definition and some inclusion theorems.

  17. Identification and role of functionally important motifs in the 970 loop of Escherichia coli 16S ribosomal RNA.

    Science.gov (United States)

    Saraiya, Ashesh A; Lamichhane, Tek N; Chow, Christine S; SantaLucia, John; Cunningham, Philip R

    2008-02-22

    The 970 loop (helix 31) of Escherichia coli 16S ribosomal RNA contains two modified nucleotides, m(2)G966 and m(5)C967. Positions A964, A969, and C970 are conserved among the Bacteria, Archaea, and Eukarya. The nucleotides present at positions 965, 966, 967, 968, and 971, however, are only conserved and unique within each domain. All organisms contain a modified nucleoside at position 966, but the type of the modification is domain specific. Biochemical and structure studies have placed this loop near the P site and have shown it to be involved in the decoding process and in binding the antibiotic tetracycline. To identify the functional components of this ribosomal RNA hairpin, the eight nucleotides of the 970 loop of helix 31 were subjected to saturation mutagenesis and 107 unique functional mutants were isolated and analyzed. Nonrandom nucleotide distributions were observed at each mutated position among the functional isolates. Nucleotide identity at positions 966 and 969 significantly affects ribosome function. Ribosomes with single mutations of m(2)G966 or m(5)C967 produce more protein in vivo than do wild-type ribosomes. Overexpression of initiation factor 3 specifically restored wild-type levels of protein synthesis to the 966 and 967 mutants, suggesting that modification of these residues is important for initiation factor 3 binding and for the proper initiation of protein synthesis.

  18. Convulxin, a C-type lectin-like protein, inhibits HCASMCs functions via WAD-motif/integrin-αv interaction and NF-κB-independent gene suppression of GRO and IL-8.

    Science.gov (United States)

    Shih, Chun-Ho; Chiang, Tin-Bin; Wang, Wen-Jeng

    2017-03-15

    Convulxin (CVX), a C-type lectin-like protein (CLPs), is a potent platelet aggregation inducer. To evaluate its potential applications in angiogenic diseases, the multimeric CVX were further explored on its mode of actions toward human coronary artery smooth muscle cells (HCASMCs). The N-terminus of β-chain of CVX (CVX-β) contains a putative disintegrin-like domain with a conserved motif upon the sequence comparison with other CLPs. Importantly, native CVX had no cytotoxic activity as examined by electrophoretic pattern. A Trp-Ala-Asp (WAD)-containing octapeptide, MTWADAEK, was thereafter synthesized and analyzed in functional assays. In the case of specific integrin antagonists as positive controls, the anti-angiogenic effects of CVX on HCASMCs were investigated by series of functional analyses. CVX showed to exhibit multiple inhibitory activities toward HCASMCs proliferation, adhesion and invasion with a dose- and integrin αvβ3-dependent fashion. However, the WAD-octapeptide exerting a minor potency could also work as an active peptidomimetic. In addition, flow cytometric analysis demonstrated both the intact CVX and synthetic peptide can specifically interact with integrin-αv on HCASMCs and CVX was shown to have a down-regulatory effect on the gene expression of CXC-chemokines, such as growth-related oncogene and interleukin-8. According to nuclear factor-κB (NF-κB) p65 translocation assay and Western blotting analysis, the NF-κB activation was not involved in the signaling events of CVX-induced gene expression. In conclusion, CVX may act as a disintegrin-like protein via the interactions of WAD-motif in CVX-β with integrin-αv on HCASMCs and it also is a gene suppressor with the ability to diminish the expression of two CXC-chemokines in a NF-κB-independent manner. Indeed, more extensive investigations are needed and might create a new avenue for the development of a novel angiostatic agent. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. IL-4 function can be transferred to the IL-2 receptor by tyrosine containing sequences found in the IL-4 receptor alpha chain.

    Science.gov (United States)

    Wang, H Y; Paul, W E; Keegan, A D

    1996-02-01

    IL-4 binds to a cell surface receptor complex that consists of the IL-4 binding protein (IL-4R alpha) and the gamma chain of the IL-2 receptor complex (gamma c). The receptors for IL-4 and IL-2 have several features in common; both use the gamma c as a receptor component, and both activate the Janus kinases JAK-1 and JAK-3. In spite of these similarities, IL-4 evokes specific responses, including the tyrosine phosphorylation of 4PS/IRS-2 and the induction of CD23. To determine whether sequences within the cytoplasmic domain of the IL-4R alpha specify these IL-4-specific responses, we transplanted the insulin IL-4 receptor motif (I4R motif) of the huIL-4R alpha to the cytoplasmic domain of a truncated IL-2R beta. In addition, we transplanted a region that contains peptide sequences shown to block Stat6 binding to DNA. We analyzed the ability of cells expressing these IL-2R-IL-4R chimeric constructs to respond to IL-2. We found that IL-4 function could be transplanted to the IL-2 receptor by these regions and that proliferative and differentiative functions can be induced by different receptor sequences.

  20. Probing structural changes of self assembled i-motif DNA

    KAUST Repository

    Lee, Iljoon; Patil, Sachin; Fhayli, Karim; Alsaiari, Shahad K.; Khashab, Niveen M.

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  1. Crystal structure of the G3BP2 NTF2-like domain in complex with a canonical FGDF motif peptide

    DEFF Research Database (Denmark)

    Kristensen, Ole

    2015-01-01

    -terminal domains of the G3BP1 and Rasputin proteins. Recently, a subset of G3BP interacting proteins was recognized to share a common sequence motif, FGDF. The most studied binding partners, USP10 and viral nsP3, interfere with essential G3BP functions related to assembly of cellular stress granules. Reported...

  2. HIV protein sequence hotspots for crosstalk with host hub proteins.

    Directory of Open Access Journals (Sweden)

    Mahdi Sarmady

    Full Text Available HIV proteins target host hub proteins for transient binding interactions. The presence of viral proteins in the infected cell results in out-competition of host proteins in their interaction with hub proteins, drastically affecting cell physiology. Functional genomics and interactome datasets can be used to quantify the sequence hotspots on the HIV proteome mediating interactions with host hub proteins. In this study, we used the HIV and human interactome databases to identify HIV targeted host hub proteins and their host binding partners (H2. We developed a high throughput computational procedure utilizing motif discovery algorithms on sets of protein sequences, including sequences of HIV and H2 proteins. We identified as HIV sequence hotspots those linear motifs that are highly conserved on HIV sequences and at the same time have a statistically enriched presence on the sequences of H2 proteins. The HIV protein motifs discovered in this study are expressed by subsets of H2 host proteins potentially outcompeted by HIV proteins. A large subset of these motifs is involved in cleavage, nuclear localization, phosphorylation, and transcription factor binding events. Many such motifs are clustered on an HIV sequence in the form of hotspots. The sequential positions of these hotspots are consistent with the curated literature on phenotype altering residue mutations, as well as with existing binding site data. The hotspot map produced in this study is the first global portrayal of HIV motifs involved in altering the host protein network at highly connected hub nodes.

  3. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  4. Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence

    NARCIS (Netherlands)

    Al-Shahib, A.; Breitling, R.; Gilbert, D.

    2005-01-01

    Abstract: When the standard approach to predict protein function by sequence homology fails, other alternative methods can be used that require only the amino acid sequence for predicting function. One such approach uses machine learning to predict protein function directly from amino acid sequence

  5. Cellulase linkers are optimized based on domain type and function: insights from sequence analysis, biophysical measurements, and molecular simulation.

    Directory of Open Access Journals (Sweden)

    Deanne W Sammond

    Full Text Available Cellulase enzymes deconstruct cellulose to glucose, and are often comprised of glycosylated linkers connecting glycoside hydrolases (GHs to carbohydrate-binding modules (CBMs. Although linker modifications can alter cellulase activity, the functional role of linkers beyond domain connectivity remains unknown. Here we investigate cellulase linkers connecting GH Family 6 or 7 catalytic domains to Family 1 or 2 CBMs, from both bacterial and eukaryotic cellulases to identify conserved characteristics potentially related to function. Sequence analysis suggests that the linker lengths between structured domains are optimized based on the GH domain and CBM type, such that linker length may be important for activity. Longer linkers are observed in eukaryotic GH Family 6 cellulases compared to GH Family 7 cellulases. Bacterial GH Family 6 cellulases are found with structured domains in either N to C terminal order, and similar linker lengths suggest there is no effect of domain order on length. O-glycosylation is uniformly distributed across linkers, suggesting that glycans are required along entire linker lengths for proteolysis protection and, as suggested by simulation, for extension. Sequence comparisons show that proline content for bacterial linkers is more than double that observed in eukaryotic linkers, but with fewer putative O-glycan sites, suggesting alternative methods for extension. Conversely, near linker termini where linkers connect to structured domains, O-glycosylation sites are observed less frequently, whereas glycines are more prevalent, suggesting the need for flexibility to achieve proper domain orientations. Putative N-glycosylation sites are quite rare in cellulase linkers, while an N-P motif, which strongly disfavors the attachment of N-glycans, is commonly observed. These results suggest that linkers exhibit features that are likely tailored for optimal function, despite possessing low sequence identity. This study suggests

  6. Universal sequence replication, reversible polymerization and early functional biopolymers: a model for the initiation of prebiotic sequence evolution.

    Directory of Open Access Journals (Sweden)

    Sara Imari Walker

    Full Text Available Many models for the origin of life have focused on understanding how evolution can drive the refinement of a preexisting enzyme, such as the evolution of efficient replicase activity. Here we present a model for what was, arguably, an even earlier stage of chemical evolution, when polymer sequence diversity was generated and sustained before, and during, the onset of functional selection. The model includes regular environmental cycles (e.g. hydration-dehydration cycles that drive polymers between times of replication and functional activity, which coincide with times of different monomer and polymer diffusivity. Template-directed replication of informational polymers, which takes place during the dehydration stage of each cycle, is considered to be sequence-independent. New sequences are generated by spontaneous polymer formation, and all sequences compete for a finite monomer resource that is recycled via reversible polymerization. Kinetic Monte Carlo simulations demonstrate that this proposed prebiotic scenario provides a robust mechanism for the exploration of sequence space. Introduction of a polymer sequence with monomer synthetase activity illustrates that functional sequences can become established in a preexisting pool of otherwise non-functional sequences. Functional selection does not dominate system dynamics and sequence diversity remains high, permitting the emergence and spread of more than one functional sequence. It is also observed that polymers spontaneously form clusters in simulations where polymers diffuse more slowly than monomers, a feature that is reminiscent of a previous proposal that the earliest stages of life could have been defined by the collective evolution of a system-wide cooperation of polymer aggregates. Overall, the results presented demonstrate the merits of considering plausible prebiotic polymer chemistries and environments that would have allowed for the rapid turnover of monomer resources and for

  7. Protein Chaperones Q8ZP25_SALTY from Salmonella Typhimurium and HYAE_ECOLI from Escherichia coli Exhibit Thioredoxin-like Structures Despite Lack of Canonical Thioredoxin Active Site Sequence Motif

    Energy Technology Data Exchange (ETDEWEB)

    Parish, D.; Benach, J; Liu, G; Singarapu, K; Xiao, R; Acton, T; Hunt, J; Montelione, G; Szyperski, T; et. al.

    2008-01-01

    The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe) hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  8. Protein chaperones Q8ZP25_SALTY from Salmonella typhimurium and HYAE_ECOLI from Escherichia coli exhibit thioredoxin-like structures despite lack of canonical thioredoxin active site sequence motif.

    Science.gov (United States)

    Parish, David; Benach, Jordi; Liu, Goahua; Singarapu, Kiran Kumar; Xiao, Rong; Acton, Thomas; Su, Min; Bansal, Sonal; Prestegard, James H; Hunt, John; Montelione, Gaetano T; Szyperski, Thomas

    2008-12-01

    The structure of the 142-residue protein Q8ZP25_SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE_ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE_ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE_ECOLI was previously classified as a [NiFe] hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  9. Motif decomposition of the phosphotyrosine proteome reveals a new N-terminal binding motif for SHIP2

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Hanke, S.; Hinsby, A. M.

    2008-01-01

    set of 481 unique phosphotyrosine (Tyr(P)) peptides by sequence similarity to known ligands of the Src homology 2 (SH2) and the phosphotyrosine binding (PTB) domains. From 20 clusters we extracted 16 known and four new interaction motifs. Using quantitative mass spectrometry we pulled down Tyr......(P)-specific binding partners for peptides corresponding to the extracted motifs. We confirmed numerous previously known interaction motifs and found 15 new interactions mediated by phosphosites not previously known to bind SH2 or PTB. Remarkably, a novel hydrophobic N-terminal motif ((L/V/I)(L/V/I)pY) was identified...

  10. Epitope-based vaccines with the Anaplasma marginale MSP1a functional motif induce a balanced humoral and cellular immune response in mice.

    Directory of Open Access Journals (Sweden)

    Paula S Santos

    Full Text Available Bovine anaplasmosis is a hemoparasitic disease that causes considerable economic loss to the dairy and beef industries. Cattle immunized with the Anaplasma marginale MSP1 outer membrane protein complex presents a protective humoral immune response; however, its efficacy is variable. Immunodominant epitopes seem to be a key-limiting factor for the adaptive immunity. We have successfully demonstrated that critical motifs of the MSP1a functional epitope are essential for antibody recognition of infected animal sera, but its protective immunity is yet to be tested. We have evaluated two synthetic vaccine formulations against A. marginale, using epitope-based approach in mice. Mice infection with bovine anaplasmosis was demonstrated by qPCR analysis of erythrocytes after 15-day exposure. A proof-of-concept was obtained in this murine model, in which peptides conjugated to bovine serum albumin were used for immunization in three 15-day intervals by intraperitoneal injections before challenging with live bacteria. Blood samples were analyzed for the presence of specific IgG2a and IgG1 antibodies, as well as for the rickettsemia analysis. A panel containing the cytokines' transcriptional profile for innate and adaptive immune responses was carried out through qPCR. Immunized BALB/c mice challenged with A. marginale presented stable body weight, reduced number of infected erythrocytes, and no mortality; and among control groups mortality rates ranged from 15% to 29%. Additionally, vaccines have significantly induced higher IgG2a than IgG1 response, followed by increased expression of pro-inflammatory cytokines. This is a successful demonstration of epitope-based vaccines, and protection against anaplasmosis may be associated with elicitation of effector functions of humoral and cellular immune responses in murine model.

  11. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    . Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif....... A special viewing feature, MHC fight, allows for display of the specificity of two different MHC molecules side by side. We show how the web server can be used to discover and display surprising similarities as well as differences between MHC molecules within and between different species. The MHC motif...

  12. Methods and statistics for combining motif match scores.

    Science.gov (United States)

    Bailey, T L; Gribskov, M

    1998-01-01

    Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

  13. iFORM: Incorporating Find Occurrence of Regulatory Motifs.

    Science.gov (United States)

    Ren, Chao; Chen, Hebing; Yang, Bite; Liu, Feng; Ouyang, Zhangyi; Bo, Xiaochen; Shu, Wenjie

    2016-01-01

    Accurately identifying the binding sites of transcription factors (TFs) is crucial to understanding the mechanisms of transcriptional regulation and human disease. We present incorporating Find Occurrence of Regulatory Motifs (iFORM), an easy-to-use and efficient tool for scanning DNA sequences with TF motifs described as position weight matrices (PWMs). Both performance assessment with a receiver operating characteristic (ROC) curve and a correlation-based approach demonstrated that iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher's combined probability test. We have used iFORM to provide accurate results on a variety of data in the ENCODE Project and the NIH Roadmap Epigenomics Project, and the tool has demonstrated its utility in further elucidating individual roles of functional elements. Both the source and binary codes for iFORM can be freely accessed at https://github.com/wenjiegroup/iFORM. The identified TF binding sites across human cell and tissue types using iFORM have been deposited in the Gene Expression Omnibus under the accession ID GSE53962.

  14. Evolution of sequence-defined highly functionalized nucleic acid polymers

    Science.gov (United States)

    Chen, Zhen; Lichtor, Phillip A.; Berliner, Adrian P.; Chen, Jonathan C.; Liu, David R.

    2018-03-01

    The evolution of sequence-defined synthetic polymers made of building blocks beyond those compatible with polymerase enzymes or the ribosome has the potential to generate new classes of receptors, catalysts and materials. Here we describe a ligase-mediated DNA-templated polymerization and in vitro selection system to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks that contain eight chemically diverse side chains on a DNA backbone. Through iterated cycles of polymer translation, selection and reverse translation, we discovered HFNAPs that bind proprotein convertase subtilisin/kexin type 9 (PCSK9) and interleukin-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity (KD = 3 nM). This evolved polymer potently inhibited the binding between PCSK9 and the low-density lipoprotein receptor. Structure-activity relationship studies revealed that specific side chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

  15. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    Science.gov (United States)

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence

  16. [Personal motif in art].

    Science.gov (United States)

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  17. Structural and Functional Analysis of VQ Motif-Containing Proteins in Arabidopsis as Interacting Proteins of WRKY Transcription Factors1[W][OA

    Science.gov (United States)

    Cheng, Yuan; Zhou, Yuan; Yang, Yan; Chi, Ying-Jun; Zhou, Jie; Chen, Jian-Ye; Wang, Fei; Fan, Baofang; Shi, Kai; Zhou, Yan-Hong; Yu, Jing-Quan; Chen, Zhixiang

    2012-01-01

    WRKY transcription factors are encoded by a large gene superfamily with a broad range of roles in plants. Recently, several groups have reported that proteins containing a short VQ (FxxxVQxLTG) motif interact with WRKY proteins. We have recently discovered that two VQ proteins from Arabidopsis (Arabidopsis thaliana), SIGMA FACTOR-INTERACTING PROTEIN1 and SIGMA FACTOR-INTERACTING PROTEIN2, act as coactivators of WRKY33 in plant defense by specifically recognizing the C-terminal WRKY domain and stimulating the DNA-binding activity of WRKY33. In this study, we have analyzed the entire family of 34 structurally divergent VQ proteins from Arabidopsis. Yeast (Saccharomyces cerevisiae) two-hybrid assays showed that Arabidopsis VQ proteins interacted specifically with the C-terminal WRKY domains of group I and the sole WRKY domains of group IIc WRKY proteins. Using site-directed mutagenesis, we identified structural features of these two closely related groups of WRKY domains that are critical for interaction with VQ proteins. Quantitative reverse transcription polymerase chain reaction revealed that expression of a majority of Arabidopsis VQ genes was responsive to pathogen infection and salicylic acid treatment. Functional analysis using both knockout mutants and overexpression lines revealed strong phenotypes in growth, development, and susceptibility to pathogen infection. Altered phenotypes were substantially enhanced through cooverexpression of genes encoding interacting VQ and WRKY proteins. These findings indicate that VQ proteins play an important role in plant growth, development, and response to environmental conditions, most likely by acting as cofactors of group I and IIc WRKY transcription factors. PMID:22535423

  18. Mitotic control of human papillomavirus genome-containing cells is regulated by the function of the PDZ-binding motif of the E6 oncoprotein

    Science.gov (United States)

    Marsh, Elizabeth K.; Delury, Craig P.; Davies, Nicholas J.; Weston, Christopher J.; Miah, Mohammed A.L.; Banks, Lawrence; Parish, Joanna L.

    2017-01-01

    The function of a conserved PDS95/DLG1/ZO1 (PDZ) binding motif (E6 PBM) at the C-termini of E6 oncoproteins of high-risk human papillomavirus (HPV) types contributes to the development of HPV-associated malignancies. Here, using a primary human keratinocyte-based model of the high-risk HPV18 life cycle, we identify a novel link between the E6 PBM and mitotic stability. In cultures containing a mutant genome in which the E6 PBM was deleted there was an increase in the frequency of abnormal mitoses, including multinucleation, compared to cells harboring the wild type HPV18 genome. The loss of the E6 PBM was associated with a significant increase in the frequency of mitotic spindle defects associated with anaphase and telophase. Furthermore, cells carrying this mutant genome had increased chromosome segregation defects and they also exhibited greater levels of genomic instability, as shown by an elevated level of centromere-positive micronuclei. In wild type HPV18 genome-containing organotypic cultures, the majority of mitotic cells reside in the suprabasal layers, in keeping with the hyperplastic morphology of the structures. However, in mutant genome-containing structures a greater proportion of mitotic cells were retained in the basal layer, which were often of undefined polarity, thus correlating with their reduced thickness. We conclude that the ability of E6 to target cellular PDZ proteins plays a critical role in maintaining mitotic stability of HPV infected cells, ensuring stable episome persistence and vegetative amplification. PMID:28061478

  19. Mitotic control of human papillomavirus genome-containing cells is regulated by the function of the PDZ-binding motif of the E6 oncoprotein.

    Science.gov (United States)

    Marsh, Elizabeth K; Delury, Craig P; Davies, Nicholas J; Weston, Christopher J; Miah, Mohammed A L; Banks, Lawrence; Parish, Joanna L; Higgs, Martin R; Roberts, Sally

    2017-03-21

    The function of a conserved PDS95/DLG1/ZO1 (PDZ) binding motif (E6 PBM) at the C-termini of E6 oncoproteins of high-risk human papillomavirus (HPV) types contributes to the development of HPV-associated malignancies. Here, using a primary human keratinocyte-based model of the high-risk HPV18 life cycle, we identify a novel link between the E6 PBM and mitotic stability. In cultures containing a mutant genome in which the E6 PBM was deleted there was an increase in the frequency of abnormal mitoses, including multinucleation, compared to cells harboring the wild type HPV18 genome. The loss of the E6 PBM was associated with a significant increase in the frequency of mitotic spindle defects associated with anaphase and telophase. Furthermore, cells carrying this mutant genome had increased chromosome segregation defects and they also exhibited greater levels of genomic instability, as shown by an elevated level of centromere-positive micronuclei. In wild type HPV18 genome-containing organotypic cultures, the majority of mitotic cells reside in the suprabasal layers, in keeping with the hyperplastic morphology of the structures. However, in mutant genome-containing structures a greater proportion of mitotic cells were retained in the basal layer, which were often of undefined polarity, thus correlating with their reduced thickness. We conclude that the ability of E6 to target cellular PDZ proteins plays a critical role in maintaining mitotic stability of HPV infected cells, ensuring stable episome persistence and vegetative amplification.

  20. Spatiotemporal network motif reveals the biological traits of developmental gene regulatory networks in Drosophila melanogaster

    Directory of Open Access Journals (Sweden)

    Kim Man-Sun

    2012-05-01

    Full Text Available Abstract Background Network motifs provided a “conceptual tool” for understanding the functional principles of biological networks, but such motifs have primarily been used to consider static network structures. Static networks, however, cannot be used to reveal time- and region-specific traits of biological systems. To overcome this limitation, we proposed the concept of a “spatiotemporal network motif,” a spatiotemporal sequence of network motifs of sub-networks which are active only at specific time points and body parts. Results On the basis of this concept, we analyzed the developmental gene regulatory network of the Drosophila melanogaster embryo. We identified spatiotemporal network motifs and investigated their distribution pattern in time and space. As a result, we found how key developmental processes are temporally and spatially regulated by the gene network. In particular, we found that nested feedback loops appeared frequently throughout the entire developmental process. From mathematical simulations, we found that mutual inhibition in the nested feedback loops contributes to the formation of spatial expression patterns. Conclusions Taken together, the proposed concept and the simulations can be used to unravel the design principle of developmental gene regulatory networks.

  1. OSR1 regulates a subset of inward rectifier potassium channels via a binding motif variant.

    Science.gov (United States)

    Taylor, Clinton A; An, Sung-Wan; Kankanamalage, Sachith Gallolu; Stippec, Steve; Earnest, Svetlana; Trivedi, Ashesh T; Yang, Jonathan Zijiang; Mirzaei, Hamid; Huang, Chou-Long; Cobb, Melanie H

    2018-04-10

    The with-no-lysine (K) (WNK) signaling pathway to STE20/SPS1-related proline- and alanine-rich kinase (SPAK) and oxidative stress-responsive 1 (OSR1) kinase is an important mediator of cell volume and ion transport. SPAK and OSR1 associate with upstream kinases WNK 1-4, substrates, and other proteins through their C-terminal domains which interact with linear R-F-x-V/I sequence motifs. In this study we find that SPAK and OSR1 also interact with similar affinity with a motif variant, R-x-F-x-V/I. Eight of 16 human inward rectifier K + channels have an R-x-F-x-V motif. We demonstrate that two of these channels, Kir2.1 and Kir2.3, are activated by OSR1, while Kir4.1, which does not contain the motif, is not sensitive to changes in OSR1 or WNK activity. Mutation of the motif prevents activation of Kir2.3 by OSR1. Both siRNA knockdown of OSR1 and chemical inhibition of WNK activity disrupt NaCl-induced plasma membrane localization of Kir2.3. Our results suggest a mechanism by which WNK-OSR1 enhance Kir2.1 and Kir2.3 channel activity by increasing their plasma membrane localization. Regulation of members of the inward rectifier K + channel family adds functional and mechanistic insight into the physiological impact of the WNK pathway.

  2. SIFTS: Structure Integration with Function, Taxonomy and Sequences resource

    Science.gov (United States)

    Velankar, Sameer; Dana, José M.; Jacobsen, Julius; van Ginkel, Glen; Gane, Paul J.; Luo, Jie; Oldfield, Thomas J.; O’Donovan, Claire; Martin, Maria-Jesus; Kleywegt, Gerard J.

    2013-01-01

    The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS; http://pdbe.org/sifts) is a close collaboration between the Protein Data Bank in Europe (PDBe) and UniProt. The two teams have developed a semi-automated process for maintaining up-to-date cross-reference information to UniProt entries, for all protein chains in the PDB entries present in the UniProt database. This process is carried out for every weekly PDB release and the information is stored in the SIFTS database. The SIFTS process includes cross-references to other biological resources such as Pfam, SCOP, CATH, GO, InterPro and the NCBI taxonomy database. The information is exported in XML format, one file for each PDB entry, and is made available by FTP. Many bioinformatics resources use SIFTS data to obtain cross-references between the PDB and other biological databases so as to provide their users with up-to-date information. PMID:23203869

  3. Functional noncoding sequences derived from SINEs in the mammalian genome.

    Science.gov (United States)

    Nishihara, Hidenori; Smit, Arian F A; Okada, Norihiro

    2006-07-01

    Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the approximately 1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality.

  4. Identification and characterization of a selenoprotein family containing a diselenide bond in a redox motif

    OpenAIRE

    Shchedrina, Valentina A.; Novoselov, Sergey V.; Malinouski, Mikalai Yu.; Gladyshev, Vadim N.

    2007-01-01

    Selenocysteine (Sec, U) insertion into proteins is directed by translational recoding of specific UGA codons located upstream of a stem-loop structure known as Sec insertion sequence (SECIS) element. Selenoproteins with known functions are oxidoreductases containing a single redox-active Sec in their active sites. In this work, we identified a family of selenoproteins, designated SelL, containing two Sec separated by two other residues to form a UxxU motif. SelL proteins show an unusual occur...

  5. DNA motif alignment by evolving a population of Markov chains.

    Science.gov (United States)

    Bi, Chengpeng

    2009-01-30

    Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

  6. Loop 7 of E2 enzymes: an ancestral conserved functional motif involved in the E2-mediated steps of the ubiquitination cascade.

    Directory of Open Access Journals (Sweden)

    Elena Papaleo

    Full Text Available The ubiquitin (Ub system controls almost every aspect of eukaryotic cell biology. Protein ubiquitination depends on the sequential action of three classes of enzymes (E1, E2 and E3. E2 Ub-conjugating enzymes have a central role in the ubiquitination pathway, interacting with both E1 and E3, and influencing the ultimate fate of the substrates. Several E2s are characterized by an extended acidic insertion in loop 7 (L7, which if mutated is known to impair the proper E2-related functions. In the present contribution, we show that acidic loop is a conserved ancestral motif in E2s, relying on the presence of alternate hydrophobic and acidic residues. Moreover, the dynamic properties of a subset of family 3 E2s, as well as their binary and ternary complexes with Ub and the cognate E3, have been investigated. Here we provide a model of L7 role in the different steps of the ubiquitination cascade of family 3 E2s. The L7 hydrophobic residues turned out to be the main determinant for the stabilization of the E2 inactive conformations by a tight network of interactions in the catalytic cleft. Moreover, phosphorylation is known from previous studies to promote E2 competent conformations for Ub charging, inducing electrostatic repulsion and acting on the L7 acidic residues. Here we show that these active conformations are stabilized by a network of hydrophobic interactions between L7 and L4, the latter being a conserved interface for E3-recruitment in several E2s. In the successive steps, L7 conserved acidic residues also provide an interaction interface for both Ub and the Rbx1 RING subdomain of the cognate E3. Our data therefore suggest a crucial role for L7 of family 3 E2s in all the E2-mediated steps of the ubiquitination cascade. Its different functions are exploited thank to its conserved hydrophobic and acidic residues in a finely orchestrate mechanism.

  7. A Conserved Metal Binding Motif in the Bacillus subtilis Competence Protein ComFA Enhances Transformation.

    Science.gov (United States)

    Chilton, Scott S; Falbel, Tanya G; Hromada, Susan; Burton, Briana M

    2017-08-01

    Genetic competence is a process in which cells are able to take up DNA from their environment, resulting in horizontal gene transfer, a major mechanism for generating diversity in bacteria. Many bacteria carry homologs of the central DNA uptake machinery that has been well characterized in Bacillus subtilis It has been postulated that the B. subtilis competence helicase ComFA belongs to the DEAD box family of helicases/translocases. Here, we made a series of mutants to analyze conserved amino acid motifs in several regions of B. subtilis ComFA. First, we confirmed that ComFA activity requires amino acid residues conserved among the DEAD box helicases, and second, we show that a zinc finger-like motif consisting of four cysteines is required for efficient transformation. Each cysteine in the motif is important, and mutation of at least two of the cysteines dramatically reduces transformation efficiency. Further, combining multiple cysteine mutations with the helicase mutations shows an additive phenotype. Our results suggest that the helicase and metal binding functions are two distinct activities important for ComFA function during transformation. IMPORTANCE ComFA is a highly conserved protein that has a role in DNA uptake during natural competence, a mechanism for horizontal gene transfer observed in many bacteria. Investigation of the details of the DNA uptake mechanism is important for understanding the ways in which bacteria gain new traits from their environment, such as drug resistance. To dissect the role of ComFA in the DNA uptake machinery, we introduced point mutations into several motifs in the protein sequence. We demonstrate that several amino acid motifs conserved among ComFA proteins are important for efficient transformation. This report is the first to demonstrate the functional requirement of an amino-terminal cysteine motif in ComFA. Copyright © 2017 American Society for Microbiology.

  8. Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks.

    Directory of Open Access Journals (Sweden)

    Mark D McDonnell

    Full Text Available Complex networks are frequently characterized by metrics for which particular subgraphs are counted. One statistic from this category, which we refer to as motif-role fingerprints, differs from global subgraph counts in that the number of subgraphs in which each node participates is counted. As with global subgraph counts, it can be important to distinguish between motif-role fingerprints that are 'structural' (induced subgraphs and 'functional' (partial subgraphs. Here we show mathematically that a vector of all functional motif-role fingerprints can readily be obtained from an arbitrary directed adjacency matrix, and then converted to structural motif-role fingerprints by multiplying that vector by a specific invertible conversion matrix. This result demonstrates that a unique structural motif-role fingerprint exists for any given functional motif-role fingerprint. We demonstrate a similar result for the cases of functional and structural motif-fingerprints without node roles, and global subgraph counts that form the basis of standard motif analysis. We also explicitly highlight that motif-role fingerprints are elemental to several popular metrics for quantifying the subgraph structure of directed complex networks, including motif distributions, directed clustering coefficient, and transitivity. The relationships between each of these metrics and motif-role fingerprints also suggest new subtypes of directed clustering coefficients and transitivities. Our results have potential utility in analyzing directed synaptic networks constructed from neuronal connectome data, such as in terms of centrality. Other potential applications include anomaly detection in networks, identification of similar networks and identification of similar nodes within networks. Matlab code for calculating all stated metrics following calculation of functional motif-role fingerprints is provided as S1 Matlab File.

  9. Motif enrichment tool.

    Science.gov (United States)

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Mutational analysis of the RecJ exonuclease of Escherichia coli: identification of phosphoesterase motifs.

    Science.gov (United States)

    Sutera, V A; Han, E S; Rajman, L A; Lovett, S T

    1999-10-01

    The recJ gene, identified in Escherichia coli, encodes a Mg(+2)-dependent 5'-to-3' exonuclease with high specificity for single-strand DNA. Genetic and biochemical experiments implicate RecJ exonuclease in homologous recombination, base excision, and methyl-directed mismatch repair. Genes encoding proteins with strong similarities to RecJ have been found in every eubacterial genome sequenced to date, with the exception of Mycoplasma and Mycobacterium tuberculosis. Multiple genes encoding proteins similar to RecJ are found in some eubacteria, including Bacillus and Helicobacter, and in the archaea. Among this divergent set of sequences, seven conserved motifs emerge. We demonstrate here that amino acids within six of these motifs are essential for both the biochemical and genetic functions of E. coli RecJ. These motifs may define interactions with Mg(2+) ions or substrate DNA. A large family of proteins more distantly related to RecJ is present in archaea, eubacteria, and eukaryotes, including a hypothetical protein in the MgPa adhesin operon of Mycoplasma, a domain of putative polyA polymerases in Synechocystis and Aquifex, PRUNE of Drosophila, and an exopolyphosphatase (PPX1) of Saccharomyces cereviseae. Because these six RecJ motifs are shared between exonucleases and exopolyphosphatases, they may constitute an ancient phosphoesterase domain now found in all kingdoms of life.

  11. The MARVEL transmembrane motif of occludin mediates oligomerization and targeting to the basolateral surface in epithelia.

    Science.gov (United States)

    Yaffe, Yakey; Shepshelovitch, Jeanne; Nevo-Yassaf, Inbar; Yeheskel, Adva; Shmerling, Hedva; Kwiatek, Joanna M; Gaus, Katharina; Pasmanik-Chor, Metsada; Hirschberg, Koret

    2012-08-01

    Occludin (Ocln), a MARVEL-motif-containing protein, is found in all tight junctions. MARVEL motifs are comprised of four transmembrane helices associated with the localization to or formation of diverse membrane subdomains by interacting with the proximal lipid environment. The functions of the Ocln MARVEL motif are unknown. Bioinformatics sequence- and structure-based analyses demonstrated that the MARVEL domain of Ocln family proteins has distinct evolutionarily conserved sequence features that are consistent with its basolateral membrane localization. Live-cell microscopy, fluorescence resonance energy transfer (FRET) and bimolecular fluorescence complementation (BiFC) were used to analyze the intracellular distribution and self-association of fluorescent-protein-tagged full-length human Ocln or the Ocln MARVEL motif excluding the cytosolic C- and N-termini (amino acids 60-269, FP-MARVEL-Ocln). FP-MARVEL-Ocln efficiently arrived at the plasma membrane (PM) and was sorted to the basolateral PM in filter-grown polarized MDCK cells. A series of conserved aromatic amino acids within the MARVEL domain were found to be associated with Ocln dimerization using BiFC. FP-MARVEL-Ocln inhibited membrane pore growth during Triton-X-100-induced solubilization and was shown to increase the membrane-ordered state using Laurdan, a lipid dye. These data demonstrate that the Ocln MARVEL domain mediates self-association and correct sorting to the basolateral membrane.

  12. Glycomic Analysis of Life Stages of the Human Parasite Schistosoma mansoni Reveals Developmental Expression Profiles of Functional and Antigenic Glycan Motifs.

    Science.gov (United States)

    Smit, Cornelis H; van Diepen, Angela; Nguyen, D Linh; Wuhrer, Manfred; Hoffmann, Karl F; Deelder, André M; Hokke, Cornelis H

    2015-07-01

    Glycans present on glycoproteins and glycolipids of the major human parasite Schistosoma mansoni induce innate as well as adaptive immune responses in the host. To be able to study the molecular characteristics of schistosome infections it is therefore required to determine the expression profiles of glycans and antigenic glycan-motifs during a range of critical stages of the complex schistosome lifecycle. We performed a longitudinal profiling study covering schistosome glycosylation throughout worm- and egg-development using a mass spectrometry-based glycomics approach. Our study revealed that during worm development N-glycans with Galβ1-4(Fucα1-3)GlcNAc (LeX) and core-xylose motifs were rapidly lost after cercariae to schistosomula transformation, whereas GalNAcβ1-4GlcNAc (LDN)-motifs gradually became abundant and predominated in adult worms. LeX-motifs were present on glycolipids up to 2 weeks of schistosomula development, whereas glycolipids with mono- and multifucosylated LDN-motifs remained present up to the adult worm stage. In contrast, expression of complex O-glycans diminished to undetectable levels within days after transformation. During egg development, a rich diversity of N-glycans with fucosylated motifs was expressed, but with α3-core fucose and a high degree of multifucosylated antennae only in mature eggs and miracidia. N-glycan antennae were exclusively LDN-based in miracidia. O-glycans in the mature eggs were also diverse and contained LeX- and multifucosylated LDN, but none of these were associated with miracidia in which we detected only the Galβ1-3(Galβ1-6)GalNAc core glycan. Immature eggs also exhibited short O-glycan core structures only, suggesting that complex fucosylated O-glycans of schistosome eggs are derived primarily from glycoproteins produced by the subshell envelope in the developed egg. Lipid glycans with multifucosylated GlcNAc repeats were present throughout egg development, but with the longer highly fucosylated

  13. Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

    KAUST Repository

    Sayadi, Ahmed

    2011-07-20

    The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath).

  14. Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.

    Science.gov (United States)

    Wang, Ying; Ding, Jun; Daniell, Henry; Hu, Haiyan; Li, Xiaoman

    2012-09-01

    Chloroplasts play critical roles in land plant cells. Despite their importance and the availability of at least 200 sequenced chloroplast genomes, the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper, we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes, indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses, protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs. Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein, which sheds light on the understanding of the transcriptional regulation of chloroplast genes.

  15. Some double sequence spaces of interval numbers defined by Orlicz function

    Directory of Open Access Journals (Sweden)

    Ayhan Esi

    2014-10-01

    Full Text Available In this paper we introduce some interval valued double sequence spaces defined by Orlicz function and study different properties of these spaces like inclusion relations, solidity, etc. We establish some inclusion relations among them. Also we introduce the concept of double statistical convergence for interval number sequences and give an inclusion relation between interval valued double sequence spaces.

  16. The Q Motif Is Involved in DNA Binding but Not ATP Binding in ChlR1 Helicase.

    Directory of Open Access Journals (Sweden)

    Hao Ding

    Full Text Available Helicases are molecular motors that couple the energy of ATP hydrolysis to the unwinding of structured DNA or RNA and chromatin remodeling. The conversion of energy derived from ATP hydrolysis into unwinding and remodeling is coordinated by seven sequence motifs (I, Ia, II, III, IV, V, and VI. The Q motif, consisting of nine amino acids (GFXXPXPIQ with an invariant glutamine (Q residue, has been identified in some, but not all helicases. Compared to the seven well-recognized conserved helicase motifs, the role of the Q motif is less acknowledged. Mutations in the human ChlR1 (DDX11 gene are associated with a unique genetic disorder known as Warsaw Breakage Syndrome, which is characterized by cellular defects in genome maintenance. To examine the roles of the Q motif in ChlR1 helicase, we performed site directed mutagenesis of glutamine to alanine at residue 23 in the Q motif of ChlR1. ChlR1 recombinant protein was overexpressed and purified from HEK293T cells. ChlR1-Q23A mutant abolished the helicase activity of ChlR1 and displayed reduced DNA binding ability. The mutant showed impaired ATPase activity but normal ATP binding. A thermal shift assay revealed that ChlR1-Q23A has a melting point value similar to ChlR1-WT. Partial proteolysis mapping demonstrated that ChlR1-WT and Q23A have a similar globular structure, although some subtle conformational differences in these two proteins are evident. Finally, we found ChlR1 exists and functions as a monomer in solution, which is different from FANCJ, in which the Q motif is involved in protein dimerization. Taken together, our results suggest that the Q motif is involved in DNA binding but not ATP binding in ChlR1 helicase.

  17. Highly scalable Ab initio genomic motif identification

    KAUST Repository

    Marchand, Benoit; Bajic, Vladimir B.; Kaushik, Dinesh

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  18. Functional analysis of bipartite begomovirus coat protein promoter sequences

    International Nuclear Information System (INIS)

    Lacatus, Gabriela; Sunter, Garry

    2008-01-01

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters

  19. Cascade detection for the extraction of localized sequence features; specificity results for HIV-1 protease and structure-function results for the Schellman loop.

    Science.gov (United States)

    Newell, Nicholas E

    2011-12-15

    The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened. Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources. Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home. nacnewell@comcast.net Supplementary information is available at Bioinformatics online.

  20. Novel anti-HIV peptides containing multiple copies of artificially designed heptad repeat motifs

    International Nuclear Information System (INIS)

    Shi Weiguo; Qi Zhi; Pan Chungen; Xue Na; Debnath, Asim K.; Qie Jiankun; Jiang Shibo; Liu Keliang

    2008-01-01

    The peptidic anti-HIV drug T20 (Fuzeon) and its analog C34 share a common heptad repeat (HR) sequence, but they have different functional domains, i.e., pocket- and lipid-binding domains (PBD and LBD, respectively). We hypothesize that novel anti-HIV peptides may be designed by using artificial sequences containing multiple copies of HR motifs plus zero, one or two functional domains. Surprisingly, we found that the peptides containing only the non-natural HR sequences could significantly inhibit HIV-1 infection, while addition of PBD and/or LBD to the peptides resulted in significant improvement of anti-HIV-1 activity. These results suggest that these artificial HR sequences, which may serve as structural domains, could be used as templates for the design of novel antiviral peptides against HIV and other viruses with class I fusion proteins

  1. CMD: A Database to Store the Bonding States of Cysteine Motifs with Secondary Structures

    Directory of Open Access Journals (Sweden)

    Hamed Bostan

    2012-01-01

    Full Text Available Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.

  2. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    Science.gov (United States)

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-04

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues.

    Science.gov (United States)

    Garrido-Martín, Diego; Pazos, Florencio

    2018-02-27

    The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.

  4. The heptanucleotide motif GAGACGC is a key component of a cis-acting promoter element that is critical for SnSAG1 expression in Sarcocystis neurona.

    Science.gov (United States)

    Gaji, Rajshekhar Y; Howe, Daniel K

    2009-07-01

    The apicomplexan parasite Sarcocystis neurona undergoes a complex process of intracellular development, during which many genes are temporally regulated. The described study was undertaken to begin identifying the basic promoter elements that control gene expression in S. neurona. Sequence analysis of the 5'-flanking region of five S. neurona genes revealed a conserved heptanucleotide motif GAGACGC that is similar to the WGAGACG motif described upstream of multiple genes in Toxoplasma gondii. The promoter region for the major surface antigen gene SnSAG1, which contains three heptanucleotide motifs within 135 bases of the transcription start site, was dissected by functional analysis using a dual luciferase reporter assay. These analyses revealed that a minimal promoter fragment containing all three motifs was sufficient to drive reporter molecule expression, with the presence and orientation of the 5'-most heptanucleotide motif being absolutely critical for promoter function. Further studies should help to identify additional sequence elements important for promoter function and for controlling gene expression during intracellular development by this apicomplexan pathogen.

  5. Chaos game representation of functional protein sequences, and simulation and multifractal analysis of induced measures

    International Nuclear Information System (INIS)

    Zu-Guo, Yu; Qian-Jun, Xiao; Long, Shi; Jun-Wu, Yu; Anh, Vo

    2010-01-01

    Investigating the biological function of proteins is a key aspect of protein studies. Bioinformatic methods become important for studying the biological function of proteins. In this paper, we first give the chaos game representation (CGR) of randomly-linked functional protein sequences, then propose the use of the recurrent iterated function systems (RIFS) in fractal theory to simulate the measure based on their chaos game representations. This method helps to extract some features of functional protein sequences, and furthermore the biological functions of these proteins. Then multifractal analysis of the measures based on the CGRs of randomly-linked functional protein sequences are performed. We find that the CGRs have clear fractal patterns. The numerical results show that the RIFS can simulate the measure based on the CGR very well. The relative standard error and the estimated probability matrix in the RIFS do not depend on the order to link the functional protein sequences. The estimated probability matrices in the RIFS with different biological functions are evidently different. Hence the estimated probability matrices in the RIFS can be used to characterise the difference among linked functional protein sequences with different biological functions. From the values of the D q curves, one sees that these functional protein sequences are not completely random. The D q of all linked functional proteins studied are multifractal-like and sufficiently smooth for the C q (analogous to specific heat) curves to be meaningful. Furthermore, the D q curves of the measure μ based on their CGRs for different orders to link the functional protein sequences are almost identical if q ≥ 0. Finally, the C q curves of all linked functional proteins resemble a classical phase transition at a critical point. (cross-disciplinary physics and related areas of science and technology)

  6. Stanniocalcin 1 binds hemin through a partially conserved heme regulatory motif

    International Nuclear Information System (INIS)

    Westberg, Johan A.; Jiang, Ji; Andersson, Leif C.

    2011-01-01

    Highlights: → Stanniocalcin 1 (STC1) binds heme through novel heme binding motif. → Central iron atom of heme and cysteine-114 of STC1 are essential for binding. → STC1 binds Fe 2+ and Fe 3+ heme. → STC1 peptide prevents oxidative decay of heme. -- Abstract: Hemin (iron protoporphyrin IX) is a necessary component of many proteins, functioning either as a cofactor or an intracellular messenger. Hemoproteins have diverse functions, such as transportation of gases, gas detection, chemical catalysis and electron transfer. Stanniocalcin 1 (STC1) is a protein involved in respiratory responses of the cell but whose mechanism of action is still undetermined. We examined the ability of STC1 to bind hemin in both its reduced and oxidized states and located Cys 114 as the axial ligand of the central iron atom of hemin. The amino acid sequence differs from the established (Cys-Pro) heme regulatory motif (HRM) and therefore presents a novel heme binding motif (Cys-Ser). A STC1 peptide containing the heme binding sequence was able to inhibit both spontaneous and H 2 O 2 induced decay of hemin. Binding of hemin does not affect the mitochondrial localization of STC1.

  7. Stanniocalcin 1 binds hemin through a partially conserved heme regulatory motif

    Energy Technology Data Exchange (ETDEWEB)

    Westberg, Johan A., E-mail: johan.westberg@helsinki.fi [Department of Pathology, Haartman Institute, University of Helsinki and HUSLAB, P.O. Box 21, Haartmaninkatu 3, FI-00014 Helsinki (Finland); Jiang, Ji, E-mail: ji.jiang@helsinki.fi [Department of Pathology, Haartman Institute, University of Helsinki and HUSLAB, P.O. Box 21, Haartmaninkatu 3, FI-00014 Helsinki (Finland); Andersson, Leif C., E-mail: leif.andersson@helsinki.fi [Department of Pathology, Haartman Institute, University of Helsinki and HUSLAB, P.O. Box 21, Haartmaninkatu 3, FI-00014 Helsinki (Finland)

    2011-06-03

    Highlights: {yields} Stanniocalcin 1 (STC1) binds heme through novel heme binding motif. {yields} Central iron atom of heme and cysteine-114 of STC1 are essential for binding. {yields} STC1 binds Fe{sup 2+} and Fe{sup 3+} heme. {yields} STC1 peptide prevents oxidative decay of heme. -- Abstract: Hemin (iron protoporphyrin IX) is a necessary component of many proteins, functioning either as a cofactor or an intracellular messenger. Hemoproteins have diverse functions, such as transportation of gases, gas detection, chemical catalysis and electron transfer. Stanniocalcin 1 (STC1) is a protein involved in respiratory responses of the cell but whose mechanism of action is still undetermined. We examined the ability of STC1 to bind hemin in both its reduced and oxidized states and located Cys{sup 114} as the axial ligand of the central iron atom of hemin. The amino acid sequence differs from the established (Cys-Pro) heme regulatory motif (HRM) and therefore presents a novel heme binding motif (Cys-Ser). A STC1 peptide containing the heme binding sequence was able to inhibit both spontaneous and H{sub 2}O{sub 2} induced decay of hemin. Binding of hemin does not affect the mitochondrial localization of STC1.

  8. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    Science.gov (United States)

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  9. POWRS: position-sensitive motif discovery.

    Directory of Open Access Journals (Sweden)

    Ian W Davis

    Full Text Available Transcription factors and the short, often degenerate DNA sequences they recognize are central regulators of gene expression, but their regulatory code is challenging to dissect experimentally. Thus, computational approaches have long been used to identify putative regulatory elements from the patterns in promoter sequences. Here we present a new algorithm "POWRS" (POsition-sensitive WoRd Set for identifying regulatory sequence motifs, specifically developed to address two common shortcomings of existing algorithms. First, POWRS uses the position-specific enrichment of regulatory elements near transcription start sites to significantly increase sensitivity, while providing new information about the preferred localization of those elements. Second, POWRS forgoes position weight matrices for a discrete motif representation that appears more resistant to over-generalization. We apply this algorithm to discover sequences related to constitutive, high-level gene expression in the model plant Arabidopsis thaliana, and then experimentally validate the importance of those elements by systematically mutating two endogenous promoters and measuring the effect on gene expression levels. This provides a foundation for future efforts to rationally engineer gene expression in plants, a problem of great importance in developing biotech crop varieties.BSD-licensed Python code at http://grassrootsbio.com/papers/powrs/.

  10. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin

    2015-01-01

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  11. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  12. Essential role of the NH2-terminal WD/EPF motif in the phosphorylation-activated protective function of mammalian Hsp27.

    Science.gov (United States)

    Thériault, Jimmy R; Lambert, Herman; Chávez-Zobel, Aura T; Charest, Gabriel; Lavigne, Pierre; Landry, Jacques

    2004-05-28

    Hsp27 is expressed at high levels after mild heat shock and contributes to making cells extremely resistant to subsequent treatments. The activity of the protein is regulated at the transcriptional level, but also by phosphorylation, which occurs rapidly during stress and is responsible for causing the dissociation of large 700-kDa Hsp27 oligomers into dimers. We investigated the mechanism by which phosphorylation and oligomerization modulate the protective activity of Chinese hamster Hsp27. In contrast to oligomer dissociation, which only required Ser90 phosphorylation, activation of Hsp27 thermoprotective activity required the phosphorylation of both Ser90 and Ser15. Replacement of Ser90 by Ala90, which prevented the dissociation of the oligomer upon stress, did cause a severe defect in the protective activity. Dissociation was, however, not a sufficient condition to activate the protein because replacement of Ser15 by Ala15, which caused little effect in the oligomeric organization of the protein, also yielded an inactive protein. Analyzes of mutants with short deletions in the NH2 terminus identified the Hsp27 WD/EPF or PF-rich domain as essential for protection, maintenance of the oligomeric structure, and in vitro chaperone activity of the protein. In light of a three-dimensional model of Hsp27 based on the crystallographic structure of wheat Hsp16.9, we propose that the conserved WD/EPF motif of mammalian Hsp27 mediates important intramolecular interactions with hydrophic surfaces of the alpha-crystallin domain of the protein. These interactions are destabilized by Ser90 phosphorylation, making the motif free to interact with heterologous molecular targets upon the additional phosphorylation of the nearby Ser15.

  13. Efficient motif finding algorithms for large-alphabet inputs

    Directory of Open Access Journals (Sweden)

    Pavlovic Vladimir

    2010-10-01

    Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

  14. Advancing Functional Metagenomics using Synthetic Biology from Soil to Sequence

    DEFF Research Database (Denmark)

    van der Helm, Eric

    as ‘functional metagenomics’, the DNA of these bacteria can be recovered from the environment and used by host-bacteria which can be grown in a lab. This allows us to make use of the capabilities of the billions of bacteria that a represent in the environment without actually growing them but by making use...

  15. Scoring protein relationships in functional interaction networks predicted from sequence data.

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    Full Text Available UNLABELLED: The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. AVAILABILITY: Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes.

  16. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    Directory of Open Access Journals (Sweden)

    Johannes Eichner

    Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  17. Coding and decoding libraries of sequence-defined functional copolymers synthesized via photoligation.

    Science.gov (United States)

    Zydziak, Nicolas; Konrad, Waldemar; Feist, Florian; Afonin, Sergii; Weidner, Steffen; Barner-Kowollik, Christopher

    2016-11-30

    Designing artificial macromolecules with absolute sequence order represents a considerable challenge. Here we report an advanced light-induced avenue to monodisperse sequence-defined functional linear macromolecules up to decamers via a unique photochemical approach. The versatility of the synthetic strategy-combining sequential and modular concepts-enables the synthesis of perfect macromolecules varying in chemical constitution and topology. Specific functions are placed at arbitrary positions along the chain via the successive addition of monomer units and blocks, leading to a library of functional homopolymers, alternating copolymers and block copolymers. The in-depth characterization of each sequence-defined chain confirms the precision nature of the macromolecules. Decoding of the functional information contained in the molecular structure is achieved via tandem mass spectrometry without recourse to their synthetic history, showing that the sequence information can be read. We submit that the presented photochemical strategy is a viable and advanced concept for coding individual monomer units along a macromolecular chain.

  18. Applications of high-throughput sequencing to chromatin structure and function in mammals

    OpenAIRE

    Dunham, Ian

    2009-01-01

    High-throughput DNA sequencing approaches have enabled direct interrogation of chromatin samples from mammalian cells. We are beginning to develop a genome-wide description of nuclear function during development, but further data collection, refinement, and integration are needed.

  19. On paranormed Zweier ideal convergent sequence spaces defined By Orlicz function

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2014-10-01

    Full Text Available In this article we introduce paranorm ideal convergent sequence spaces using Zweier transform and Orlicz function. We study some topological and algebraic properties. Further we prove some inclusion relations related to these new spaces.

  20. An essential GT motif in the lamin A promoter mediates activation by CREB-binding protein

    International Nuclear Information System (INIS)

    Janaki Ramaiah, M.; Parnaik, Veena K.

    2006-01-01

    Lamin A is an important component of nuclear architecture in mammalian cells. Mutations in the human lamin A gene lead to highly degenerative disorders that affect specific tissues. In studies directed towards understanding the mode of regulation of the lamin A promoter, we have identified an essential GT motif at -55 position by reporter gene assays and mutational analysis. Binding of this sequence to Sp transcription factors has been observed in electrophoretic mobility shift assays and by chromatin immunoprecipitation studies. Further functional analysis by co-expression of recombinant proteins and ChIP assays has shown an important regulatory role for CREB-binding protein in promoter activation, which is mediated by the GT motif

  1. PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways

    OpenAIRE

    Mi, Huaiyu; Guo, Nan; Kejariwal, Anish; Thomas, Paul D.

    2006-01-01

    PANTHER is a freely available, comprehensive software system for relating protein sequence evolution to the evolution of specific protein functions and biological roles. Since 2005, there have been three main improvements to PANTHER. First, the sequences used to create evolutionary trees are carefully selected to provide coverage of phylogenetic as well as functional information. Second, PANTHER is now a member of the InterPro Consortium, and the PANTHER hidden markov Models (HMMs) are distri...

  2. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Science.gov (United States)

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-01

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683

  3. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Directory of Open Access Journals (Sweden)

    Graziele Pereira Oliveira

    2017-01-01

    Full Text Available For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV, raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’ that could be evolved gradually by nucleotides’ gain and loss and point mutations.

  4. Binding properties of SUMO-interacting motifs (SIMs) in yeast.

    Science.gov (United States)

    Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

    2015-03-01

    Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.

  5. Canonical Bcl-2 motifs of the Na+/K+ pump revealed by the BH3 mimetic chelerythrine: early signal transducers of apoptosis?

    Science.gov (United States)

    Lauf, Peter K; Heiny, Judith; Meller, Jarek; Lepera, Michael A; Koikov, Leonid; Alter, Gerald M; Brown, Thomas L; Adragna, Norma C

    2013-01-01

    Chelerythrine [CET], a protein kinase C [PKC] inhibitor, is a prop-apoptotic BH3-mimetic binding to BH1-like motifs of Bcl-2 proteins. CET action was examined on PKC phosphorylation-dependent membrane transporters (Na+/K+ pump/ATPase [NKP, NKA], Na+-K+-2Cl+ [NKCC] and K+-Cl- [KCC] cotransporters, and channel-supported K+ loss) in human lens epithelial cells [LECs]. K+ loss and K+ uptake, using Rb+ as congener, were measured by atomic absorption/emission spectrophotometry with NKP and NKCC inhibitors, and Cl- replacement by NO3ˉ to determine KCC. 3H-Ouabain binding was performed on a pig renal NKA in the presence and absence of CET. Bcl-2 protein and NKA sequences were aligned and motifs identified and mapped using PROSITE in conjunction with BLAST alignments and analysis of conservation and structural similarity based on prediction of secondary and crystal structures. CET inhibited NKP and NKCC by >90% (IC50 values ~35 and ~15 μM, respectively) without significant KCC activity change, and stimulated K+ loss by ~35% at 10-30 μM. Neither ATP levels nor phosphorylation of the NKA α1 subunit changed. 3H-ouabain was displaced from pig renal NKA only at 100 fold higher CET concentrations than the ligand. Sequence alignments of NKA with BH1- and BH3-like motifs containing pro-survival Bcl-2 and BclXl proteins showed more than one BH1-like motif within NKA for interaction with CET or with BH3 motifs. One NKA BH1-like motif (ARAAEILARDGPN) was also found in all P-type ATPases. Also, NKA possessed a second motif similar to that near the BH3 region of Bcl-2. Findings support the hypothesis that CET inhibits NKP by binding to BH1-like motifs and disrupting the α1 subunit catalytic activity through conformational changes. By interacting with Bcl-2 proteins through their complementary BH1- or BH3-like-motifs, NKP proteins may be sensors of normal and pathological cell functions, becoming important yet unrecognized signal transducers in the initial phases of apoptosis. CET

  6. Canonical Bcl-2 Motifs of the Na+/K+ Pump Revealed by the BH3 Mimetic Chelerythrine: Early Signal Transducers of Apoptosis?

    Directory of Open Access Journals (Sweden)

    Peter K. Lauf

    2013-02-01

    Full Text Available Background/Aims: Chelerythrine [CET], a protein kinase C [PKC] inhibitor, is a prop-apoptotic BH3-mimetic binding to BH1-like motifs of Bcl-2 proteins. CET action was examined on PKC phosphorylation-dependent membrane transporters (Na+/K+ pump/ATPase [NKP, NKA], Na+-K+-2Cl+ [NKCC] and K+-Cl- [KCC] cotransporters, and channel-supported K+ loss in human lens epithelial cells [LECs]. Methods: K+ loss and K+ uptake, using Rb+ as congener, were measured by atomic absorption/emission spectrophotometry with NKP and NKCC inhibitors, and Cl- replacement by NO3ˉ to determine KCC. 3H-Ouabain binding was performed on a pig renal NKA in the presence and absence of CET. Bcl-2 protein and NKA sequences were aligned and motifs identified and mapped using PROSITE in conjunction with BLAST alignments and analysis of conservation and structural similarity based on prediction of secondary and crystal structures. Results: CET inhibited NKP and NKCC by >90% (IC50 values ∼35 and ∼15 µM, respectively without significant KCC activity change, and stimulated K+ loss by ∼35% at 10-30 µM. Neither ATP levels nor phosphorylation of the NKA α1 subunit changed. 3H-ouabain was displaced from pig renal NKA only at 100 fold higher CET concentrations than the ligand. Sequence alignments of NKA with BH1- and BH3-like motifs containing pro-survival Bcl-2 and BclXl proteins showed more than one BH1-like motif within NKA for interaction with CET or with BH3 motifs. One NKA BH1-like motif (ARAAEILARDGPN was also found in all P-type ATPases. Also, NKA possessed a second motif similar to that near the BH3 region of Bcl-2. Conclusion: Findings support the hypothesis that CET inhibits NKP by binding to BH1-like motifs and disrupting the α1 subunit catalytic activity through conformational changes. By interacting with Bcl-2 proteins through their complementary BH1- or BH3-like-motifs, NKP proteins may be sensors of normal and pathological cell functions, becoming important yet

  7. Rtt107/Esc4 binds silent chromatin and DNA repair proteins using different BRCT motifs

    Directory of Open Access Journals (Sweden)

    Jockusch Rebecca A

    2006-11-01

    Full Text Available Abstract Background By screening a plasmid library for proteins that could cause silencing when targeted to the HMR locus in Saccharomyces cerevisiae, we previously reported the identification of Rtt107/Esc4 based on its ability to establish silent chromatin. In this study we aimed to determine the mechanism of Rtt107/Esc4 targeted silencing and also learn more about its biological functions. Results Targeted silencing by Rtt107/Esc4 was dependent on the SIR genes, which encode obligatory structural and enzymatic components of yeast silent chromatin. Based on its sequence, Rtt107/Esc4 was predicted to contain six BRCT motifs. This motif, originally identified in the human breast tumor suppressor gene BRCA1, is a protein interaction domain. The targeted silencing activity of Rtt107/Esc4 resided within the C-terminal two BRCT motifs, and this region of the protein bound to Sir3 in two-hybrid tests. Deletion of RTT107/ESC4 caused sensitivity to the DNA damaging agent MMS as well as to hydroxyurea. A two-hybrid screen showed that the N-terminal BRCT motifs of Rtt107/Esc4 bound to Slx4, a protein previously shown to be involved in DNA repair and required for viability in a strain lacking the DNA helicase Sgs1. Like SLX genes, RTT107ESC4 interacted genetically with SGS1; esc4Δ sgs1Δ mutants were viable, but exhibited a slow-growth phenotype and also a synergistic DNA repair defect. Conclusion Rtt107/Esc4 binds to the silencing protein Sir3 and the DNA repair protein Slx4 via different BRCT motifs, thus providing a bridge linking silent chromatin to DNA repair enzymes.

  8. Distinct configurations of protein complexes and biochemical pathways revealed by epistatic interaction network motifs

    LENUS (Irish Health Repository)

    Casey, Fergal

    2011-08-22

    Abstract Background Gene and protein interactions are commonly represented as networks, with the genes or proteins comprising the nodes and the relationship between them as edges. Motifs, or small local configurations of edges and nodes that arise repeatedly, can be used to simplify the interpretation of networks. Results We examined triplet motifs in a network of quantitative epistatic genetic relationships, and found a non-random distribution of particular motif classes. Individual motif classes were found to be associated with different functional properties, suggestive of an underlying biological significance. These associations were apparent not only for motif classes, but for individual positions within the motifs. As expected, NNN (all negative) motifs were strongly associated with previously reported genetic (i.e. synthetic lethal) interactions, while PPP (all positive) motifs were associated with protein complexes. The two other motif classes (NNP: a positive interaction spanned by two negative interactions, and NPP: a negative spanned by two positives) showed very distinct functional associations, with physical interactions dominating for the former but alternative enrichments, typical of biochemical pathways, dominating for the latter. Conclusion We present a model showing how NNP motifs can be used to recognize supportive relationships between protein complexes, while NPP motifs often identify opposing or regulatory behaviour between a gene and an associated pathway. The ability to use motifs to point toward underlying biological organizational themes is likely to be increasingly important as more extensive epistasis mapping projects in higher organisms begin.

  9. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    Science.gov (United States)

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  10. Evolutionary rates at codon sites may be used to align sequences and infer protein domain function

    Directory of Open Access Journals (Sweden)

    Hazelhurst Scott

    2010-03-01

    Full Text Available Abstract Background Sequence alignments form part of many investigations in molecular biology, including the determination of phylogenetic relationships, the prediction of protein structure and function, and the measurement of evolutionary rates. However, to obtain meaningful results, a significant degree of sequence similarity is required to ensure that the alignments are accurate and the inferences correct. Limitations arise when sequence similarity is low, which is particularly problematic when working with fast-evolving genes, evolutionary distant taxa, genomes with nucleotide biases, and cases of convergent evolution. Results A novel approach was conceptualized to address the "low sequence similarity" alignment problem. We developed an alignment algorithm termed FIRE (Functional Inference using the Rates of Evolution, which aligns sequences using the evolutionary rate at codon sites, as measured by the dN/dS ratio, rather than nucleotide or amino acid residues. FIRE was used to test the hypotheses that evolutionary rates can be used to align sequences and that the alignments may be used to infer protein domain function. Using a range of test data, we found that aligning domains based on evolutionary rates was possible even when sequence similarity was very low (for example, antibody variable regions. Furthermore, the alignment has the potential to infer protein domain function, indicating that domains with similar functions are subject to similar evolutionary constraints. These data suggest that an evolutionary rate-based approach to sequence analysis (particularly when combined with structural data may be used to study cases of convergent evolution or when sequences have very low similarity. However, when aligning homologous gene sets with sequence similarity, FIRE did not perform as well as the best traditional alignment algorithms indicating that the conventional approach of aligning residues as opposed to evolutionary rates remains the

  11. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

    Science.gov (United States)

    Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

    2008-09-01

    A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.

  12. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-05-01

    The purpose of this dissertation is to present a methodology to model global sequence alignment problem as directed acyclic graph which helps to extract all possible optimal alignments. Moreover, a mechanism to sequentially optimize sequence alignment problem relative to different cost functions is suggested. Sequence alignment is mostly important in computational biology. It is used to find evolutionary relationships between biological sequences. There are many algo- rithms that have been developed to solve this problem. The most famous algorithms are Needleman-Wunsch and Smith-Waterman that are based on dynamic program- ming. In dynamic programming, problem is divided into a set of overlapping sub- problems and then the solution of each subproblem is found. Finally, the solutions to these subproblems are combined into a final solution. In this thesis it has been proved that for two sequences of length m and n over a fixed alphabet, the suggested optimization procedure requires O(mn) arithmetic operations per cost function on a single processor machine. The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  13. Utility of sequenced genomes for microsatellite marker development in non-model organisms: a case study of functionally important genes in nine-spined sticklebacks (Pungitius pungitius

    Directory of Open Access Journals (Sweden)

    Shimada Yukinori

    2010-05-01

    Full Text Available Abstract Background Identification of genes involved in adaptation and speciation by targeting specific genes of interest has become a plausible strategy also for non-model organisms. We investigated the potential utility of available sequenced fish genomes to develop microsatellite (cf. simple sequence repeat, SSR markers for functionally important genes in nine-spined sticklebacks (Pungitius pungitius, as well as cross-species transferability of SSR primers from three-spined (Gasterosteus aculeatus to nine-spined sticklebacks. In addition, we examined the patterns and degree of SSR conservation between these species using their aligned sequences. Results Cross-species amplification success was lower for SSR markers located in or around functionally important genes (27 out of 158 than for those randomly derived from genomic (35 out of 101 and cDNA (35 out of 87 libraries. Polymorphism was observed at a large proportion (65% of the cross-amplified loci independently of SSR type. To develop SSR markers for functionally important genes in nine-spined sticklebacks, SSR locations were surveyed in or around 67 target genes based on the three-spined stickleback genome and these regions were sequenced with primers designed from conserved sequences in sequenced fish genomes. Out of the 81 SSRs identified in the sequenced regions (44,084 bp, 57 exhibited the same motifs at the same locations as in the three-spined stickleback. Di- and trinucleotide SSRs appeared to be highly conserved whereas mononucleotide SSRs were less so. Species-specific primers were designed to amplify 58 SSRs using the sequences of nine-spined sticklebacks. Conclusions Our results demonstrated that a large proportion of SSRs are conserved in the species that have diverged more than 10 million years ago. Therefore, the three-spined stickleback genome can be used to predict SSR locations in the nine-spined stickleback genome. While cross-species utility of SSR primers is limited due

  14. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences

    Directory of Open Access Journals (Sweden)

    Meinicke Peter

    2009-09-01

    Full Text Available Abstract Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  15. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  16. BEAM web server: a tool for structural RNA motif discovery.

    Science.gov (United States)

    Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2018-03-15

    RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.

  17. Reconciling mass functions with the star-forming main sequence via mergers

    Science.gov (United States)

    Steinhardt, Charles L.; Yurk, Dominic; Capak, Peter

    2017-06-01

    We combine star formation along the 'main sequence', quiescence and clustering and merging to produce an empirical model for the evolution of individual galaxies. Main-sequence star formation alone would significantly steepen the stellar mass function towards low redshift, in sharp conflict with observation. However, a combination of star formation and merging produces a consistent result for correct choice of the merger rate function. As a result, we are motivated to propose a model in which hierarchical merging is disconnected from environmentally independent star formation. This model can be tested via correlation functions and would produce new constraints on clustering and merging.

  18. Computational analyses of synergism in small molecular network motifs.

    Directory of Open Access Journals (Sweden)

    Yili Zhang

    2014-03-01

    Full Text Available Cellular functions and responses to stimuli are controlled by complex regulatory networks that comprise a large diversity of molecular components and their interactions. However, achieving an intuitive understanding of the dynamical properties and responses to stimuli of these networks is hampered by their large scale and complexity. To address this issue, analyses of regulatory networks often focus on reduced models that depict distinct, reoccurring connectivity patterns referred to as motifs. Previous modeling studies have begun to characterize the dynamics of small motifs, and to describe ways in which variations in parameters affect their responses to stimuli. The present study investigates how variations in pairs of parameters affect responses in a series of ten common network motifs, identifying concurrent variations that act synergistically (or antagonistically to alter the responses of the motifs to stimuli. Synergism (or antagonism was quantified using degrees of nonlinear blending and additive synergism. Simulations identified concurrent variations that maximized synergism, and examined the ways in which it was affected by stimulus protocols and the architecture of a motif. Only a subset of architectures exhibited synergism following paired changes in parameters. The approach was then applied to a model describing interlocked feedback loops governing the synthesis of the CREB1 and CREB2 transcription factors. The effects of motifs on synergism for this biologically realistic model were consistent with those for the abstract models of single motifs. These results have implications for the rational design of combination drug therapies with the potential for synergistic interactions.

  19. Distinct repeat motifs at the C-terminal region of CagA of Helicobacter pylori strains isolated from diseased patients and asymptomatic individuals in West Bengal, India

    Directory of Open Access Journals (Sweden)

    Chattopadhyay Santanu

    2012-05-01

    Full Text Available Abstract Background Infection with Helicobacter pylori strains that express CagA is associated with gastritis, peptic ulcer disease, and gastric adenocarcinoma. The biological function of CagA depends on tyrosine phosphorylation by a cellular kinase. The phosphate acceptor tyrosine moiety is present within the EPIYA motif at the C-terminal region of the protein. This region is highly polymorphic due to variations in the number of EPIYA motifs and the polymorphism found in spacer regions among EPIYA motifs. The aim of this study was to analyze the polymorphism at the C-terminal end of CagA and to evaluate its association with the clinical status of the host in West Bengal, India. Results Seventy-seven H. pylori strains isolated from patients with various clinical statuses were used to characterize the C-ternimal polymorphic region of CagA. Our analysis showed that there is no correlation between the previously described CagA types and various disease outcomes in Indian context. Further analyses of different CagA structures revealed that the repeat units in the spacer sequences within the EPIYA motifs are actually more discrete than the previously proposed models of CagA variants. Conclusion Our analyses suggest that EPIYA motifs as well as the spacer sequence units are present as distinct insertions and deletions, which possibly have arisen from extensive recombination events. Moreover, we have identified several new CagA types, which could not be typed by the existing systems and therefore, we have proposed a new typing system. We hypothesize that a cagA gene encoding higher number EPIYA motifs may perhaps have arisen from cagA genes that encode lesser EPIYA motifs by acquisition of DNA segments through recombination events.

  20. Hybrid DNA i-motif: Aminoethylprolyl-PNA (pC5) enhance the stability of DNA (dC5) i-motif structure.

    Science.gov (United States)

    Gade, Chandrasekhar Reddy; Sharma, Nagendra K

    2017-12-15

    This report describes the synthesis of C-rich sequence, cytosine pentamer, of aep-PNA and its biophysical studies for the formation of hybrid DNA:aep-PNAi-motif structure with DNA cytosine pentamer (dC 5 ) under acidic pH conditions. Herein, the CD/UV/NMR/ESI-Mass studies strongly support the formation of stable hybrid DNA i-motif structure with aep-PNA even near acidic conditions. Hence aep-PNA C-rich sequence cytosine could be considered as potential DNA i-motif stabilizing agents in vivo conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Identification of group specific motifs in Beta-lactamase family of proteins

    Directory of Open Access Journals (Sweden)

    Saxena Akansha

    2009-12-01

    Full Text Available Abstract Background Beta-lactamases are one of the most serious threats to public health. In order to combat this threat we need to study the molecular and functional diversity of these enzymes and identify signatures specific to these enzymes. These signatures will enable us to develop inhibitors and diagnostic probes specific to lactamases. The existing classification of beta-lactamases was developed nearly 30 years ago when few lactamases were available. DLact database contain more than 2000 beta-lactamase, which can be used to study the molecular diversity and to identify signatures specific to this family. Methods A set of 2020 beta-lactamase proteins available in the DLact database http://59.160.102.202/DLact were classified using graph-based clustering of Best Bi-Directional Hits. Non-redundant (> 90 percent identical protein sequences from each group were aligned using T-Coffee and annotated using information available in literature. Motifs specific to each group were predicted using PRATT program. Results The graph-based classification of beta-lactamase proteins resulted in the formation of six groups (Four major groups containing 191, 726, 774 and 73 proteins while two minor groups containing 50 and 8 proteins. Based on the information available in literature, we found that each of the four major groups correspond to the four classes proposed by Ambler. The two minor groups were novel and do not contain molecular signatures of beta-lactamase proteins reported in literature. The group-specific motifs showed high sensitivity (> 70% and very high specificity (> 90%. The motifs from three groups (corresponding to class A, C and D had a high level of conservation at DNA as well as protein level whereas the motifs from the fourth group (corresponding to class B showed conservation at only protein level. Conclusion The graph-based classification of beta-lactamase proteins corresponds with the classification proposed by Ambler, thus there is

  2. Motor sequence learning-induced neural efficiency in functional brain connectivity.

    Science.gov (United States)

    Karim, Helmet T; Huppert, Theodore J; Erickson, Kirk I; Wollam, Mariegold E; Sparto, Patrick J; Sejdić, Ervin; VanSwearingen, Jessie M

    2017-02-15

    Previous studies have shown the functional neural circuitry differences before and after an explicitly learned motor sequence task, but have not assessed these changes during the process of motor skill learning. Functional magnetic resonance imaging activity was measured while participants (n=13) were asked to tap their fingers to visually presented sequences in blocks that were either the same sequence repeated (learning block) or random sequences (control block). Motor learning was associated with a decrease in brain activity during learning compared to control. Lower brain activation was noted in the posterior parietal association area and bilateral thalamus during the later periods of learning (not during the control). Compared to the control condition, we found the task-related motor learning was associated with decreased connectivity between the putamen and left inferior frontal gyrus and left middle cingulate brain regions. Motor learning was associated with changes in network activity, spatial extent, and connectivity. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Organization of feed-forward loop motifs reveals architectural principles in natural and engineered networks.

    Science.gov (United States)

    Gorochowski, Thomas E; Grierson, Claire S; di Bernardo, Mario

    2018-03-01

    Network motifs are significantly overrepresented subgraphs that have been proposed as building blocks for natural and engineered networks. Detailed functional analysis has been performed for many types of motif in isolation, but less is known about how motifs work together to perform complex tasks. To address this issue, we measure the aggregation of network motifs via methods that extract precisely how these structures are connected. Applying this approach to a broad spectrum of networked systems and focusing on the widespread feed-forward loop motif, we uncover striking differences in motif organization. The types of connection are often highly constrained, differ between domains, and clearly capture architectural principles. We show how this information can be used to effectively predict functionally important nodes in the metabolic network of Escherichia coli . Our findings have implications for understanding how networked systems are constructed from motif parts and elucidate constraints that guide their evolution.

  4. Linear motif atlas for phosphorylation-dependent signaling

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Jensen, LJ; Diella, F

    2008-01-01

    bind to them remains a challenge. NetPhorest is an atlas of consensus sequence motifs that covers 179 kinases and 104 phosphorylation-dependent binding domains [Src homology 2 (SH2), phosphotyrosine binding (PTB), BRCA1 C-terminal (BRCT), WW, and 14-3-3]. The atlas reveals new aspects of signaling...

  5. High affinity recognition of a Phytophthora protein by Arabidopsis via an RGD motif

    NARCIS (Netherlands)

    Senchou, V.; Weide, R.L.; Carrasco, A.; Bouyssou, H.; Pont-Lezica, R.; Govers, F.; Canut, H.

    2004-01-01

    The RGD tripeptide sequence, a cell adhesion motif present in several extracellular matrix proteins of mammalians, is involved in numerous plant processes. In plant-pathogen interactions, the RGD motif is believed to reduce plant defence responses by disrupting adhesions between the cell wall and

  6. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.

    Science.gov (United States)

    Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke

    2009-02-15

    Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http

  7. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Directory of Open Access Journals (Sweden)

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  8. A method for partitioning the information contained in a protein sequence between its structure and function.

    Science.gov (United States)

    Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido

    2018-05-23

    Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.

  9. Likelihood functions for the analysis of single-molecule binned photon sequences

    Energy Technology Data Exchange (ETDEWEB)

    Gopich, Irina V., E-mail: irinag@niddk.nih.gov [Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892 (United States)

    2012-03-02

    Graphical abstract: Folding of a protein with attached fluorescent dyes, the underlying conformational trajectory of interest, and the observed binned photon trajectory. Highlights: Black-Right-Pointing-Pointer A sequence of photon counts can be analyzed using a likelihood function. Black-Right-Pointing-Pointer The exact likelihood function for a two-state kinetic model is provided. Black-Right-Pointing-Pointer Several approximations are considered for an arbitrary kinetic model. Black-Right-Pointing-Pointer Improved likelihood functions are obtained to treat sequences of FRET efficiencies. - Abstract: We consider the analysis of a class of experiments in which the number of photons in consecutive time intervals is recorded. Sequence of photon counts or, alternatively, of FRET efficiencies can be studied using likelihood-based methods. For a kinetic model of the conformational dynamics and state-dependent Poisson photon statistics, the formalism to calculate the exact likelihood that this model describes such sequences of photons or FRET efficiencies is developed. Explicit analytic expressions for the likelihood function for a two-state kinetic model are provided. The important special case when conformational dynamics are so slow that at most a single transition occurs in a time bin is considered. By making a series of approximations, we eventually recover the likelihood function used in hidden Markov models. In this way, not only is insight gained into the range of validity of this procedure, but also an improved likelihood function can be obtained.

  10. Recoding method that removes inhibitory sequences and improves HIV gene expression

    Energy Technology Data Exchange (ETDEWEB)

    Rabadan, Raul; Krasnitz, Michael; Robins, Harlan; Witten, Daniela; Levine, Arnold

    2016-08-23

    The invention relates to inhibitory nucleotide signal sequences or "INS" sequences in the genomes of lentiviruses. In particular the invention relates to the AGG motif present in all viral genomes. The AGG motif may have an inhibitory effect on a virus, for example by reducing the levels of, or maintaining low steady-state levels of, viral RNAs in host cells, and inducing and/or maintaining in viral latency. In one aspect, the invention provides vaccines that contain, or are produced from, viral nucleic acids in which the AGG sequences have been mutated. In another aspect, the invention provides methods and compositions for affecting the function of the AGG motif, and methods for identifying other INS sequences in viral genomes.

  11. RECONCILING THE OBSERVED STAR-FORMING SEQUENCE WITH THE OBSERVED STELLAR MASS FUNCTION

    International Nuclear Information System (INIS)

    Leja, Joel; Van Dokkum, Pieter G.; Franx, Marijn; Whitaker, Katherine E.

    2015-01-01

    We examine the connection between the observed star-forming sequence (SFR ∝ M α ) and the observed evolution of the stellar mass function in the range 0.2 < z < 2.5. We find that the star-forming sequence cannot have a slope α ≲ 0.9 at all masses and redshifts because this would result in a much higher number density at 10 < log (M/M ☉ ) < 11 by z = 1 than is observed. We show that a transition in the slope of the star-forming sequence, such that α = 1 at log (M/M ☉ ) < 10.5 and α = 0.7-0.13z (Whitaker et al.) at log (M/M ☉ ) > 10.5, greatly improves agreement with the evolution of the stellar mass function. We then derive a star-forming sequence that reproduces the evolution of the mass function by design. This star-forming sequence is also well described by a broken power law, with a shallow slope at high masses and a steep slope at low masses. At z = 2, it is offset by ∼0.3 dex from the observed star-forming sequence, consistent with the mild disagreement between the cosmic star formation rate (SFR) and recent observations of the growth of the stellar mass density. It is unclear whether this problem stems from errors in stellar mass estimates, errors in SFRs, or other effects. We show that a mass-dependent slope is also seen in other self-consistent models of galaxy evolution, including semianalytical, hydrodynamical, and abundance-matching models. As part of the analysis, we demonstrate that neither mergers nor hidden low-mass quiescent galaxies are likely to reconcile the evolution of the mass function and the star-forming sequence. These results are supported by observations from Whitaker et al

  12. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

    KAUST Repository

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert

    2017-01-01

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often

  13. Leucine-based receptor sorting motifs are dependent on the spacing relative to the plasma membrane

    DEFF Research Database (Denmark)

    Geisler, C; Dietrich, J; Nielsen, B L

    1998-01-01

    Many integral membrane proteins contain leucine-based motifs within their cytoplasmic domains that mediate internalization and intracellular sorting. Two types of leucine-based motifs have been identified. One type is dependent on phosphorylation, whereas the other type, which includes an acidic...... amino acid, is constitutively active. In this study, we have investigated how the spacing relative to the plasma membrane affects the function of both types of leucine-based motifs. For phosphorylation-dependent leucine-based motifs, a minimal spacing of 7 residues between the plasma membrane...... and the phospho-acceptor was required for phosphorylation and thereby activation of the motifs. For constitutively active leucine-based motifs, a minimal spacing of 6 residues between the plasma membrane and the acidic residue was required for optimal activity of the motifs. In addition, we found that the acidic...

  14. Hunting down frame shifts: Ecological analysis of diverse functional gene sequences

    Directory of Open Access Journals (Sweden)

    Michal eStrejcek

    2015-11-01

    Full Text Available Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frame-shifts (FS. Genes encoding for alpha subunits of biphenyl (bphA and benzoate (benA dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 43.1% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of Maximum Expected Error (MEE filtering and single linkage pre-clustering (SLP proved the most efficient read procession. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study and the tool was implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/ and https://github.com/rdpstaff/Framebot.

  15. Crammed signaling motifs in the T-cell receptor.

    Science.gov (United States)

    Borroto, Aldo; Abia, David; Alarcón, Balbino

    2014-09-01

    Although the T cell antigen receptor (TCR) is long known to contain multiple signaling subunits (CD3γ, CD3δ, CD3ɛ and CD3ζ), their role in signal transduction is still not well understood. The presence of at least one immunoreceptor tyrosine-based activation motif (ITAM) in each CD3 subunit has led to the idea that the multiplication of such elements essentially serves to amplify signals. However, the evolutionary conservation of non-ITAM sequences suggests that each CD3 subunit is likely to have specific non-redundant roles at some stage of development or in mature T cell function. The CD3ɛ subunit is paradigmatic because in a relatively short cytoplasmic sequence (∼55 amino acids) it contains several docking sites for proteins involved in intracellular trafficking and signaling, proteins whose relevance in T cell activation is slowly starting to be revealed. In this review we will summarize our current knowledge on the signaling effectors that bind directly to the TCR and we will propose a hierarchy in their response to TCR triggering. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. A Simple Decision Rule for Recognition of Poly(A) Tail Signal Motifs in Human Genome

    KAUST Repository

    AbouEisha, Hassan M.; Chikalov, Igor; Moshkov, Mikhail; Jankovic, Boris R.

    2015-01-01

    Background is the numerous attempts were made to predict motifs in genomic sequences that correspond to poly (A) tail signals. Vast portion of this effort has been directed to a plethora of nonlinear classification methods. Even when such approaches

  17. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin; Wong, Hau-San

    2015-01-01

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  18. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun

    2015-06-11

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  19. Identification and functional characterization of a novel bipartite nuclear localization sequence in ARID1A

    Energy Technology Data Exchange (ETDEWEB)

    Bateman, Nicholas W. [Women' s Health Integrated Research Center at Inova Health System, Gynecologic Cancer Center of Excellence, Annandale 22003, VA (United States); The John P. Murtha Cancer Center, Walter Reed National Military Medical Center, 8901 Wisconsin Avenue, Bethesda 20889, MD (United States); Shoji, Yutaka [Department of Obstetrics, Gynecology and Reproductive Biology, Michigan State University, Grand Rapids 49503, MI (United States); Conrads, Kelly A.; Stroop, Kevin D. [Women' s Health Integrated Research Center at Inova Health System, Gynecologic Cancer Center of Excellence, Annandale 22003, VA (United States); Hamilton, Chad A. [Women' s Health Integrated Research Center at Inova Health System, Gynecologic Cancer Center of Excellence, Annandale 22003, VA (United States); The John P. Murtha Cancer Center, Walter Reed National Military Medical Center, 8901 Wisconsin Avenue, Bethesda 20889, MD (United States); Gynecologic Oncology Service, Department of Obstetrics and Gynecology, Walter Reed National Military Medical Center, 8901 Wisconsin Ave, MD, Bethesda, 20889 (United States); Department of Obstetrics and Gynecology, Uniformed Services University of the Health Sciences, Bethesda 20814, MD (United States); Darcy, Kathleen M. [Women' s Health Integrated Research Center at Inova Health System, Gynecologic Cancer Center of Excellence, Annandale 22003, VA (United States); The John P. Murtha Cancer Center, Walter Reed National Military Medical Center, 8901 Wisconsin Avenue, Bethesda 20889, MD (United States); Maxwell, George L. [Department of Obstetrics and Gynecology, Inova Fairfax Hospital, Falls Church, VA 22042 (United States); Risinger, John I. [Department of Obstetrics, Gynecology and Reproductive Biology, Michigan State University, Grand Rapids 49503, MI (United States); and others

    2016-01-01

    AT-rich interactive domain-containing protein 1A (ARID1A) is a recently identified nuclear tumor suppressor frequently altered in solid tumor malignancies. We have identified a bipartite-like nuclear localization sequence (NLS) that contributes to nuclear import of ARID1A not previously described. We functionally confirm activity using GFP constructs fused with wild-type or mutant NLS sequences. We further show that cyto-nuclear localized, bipartite NLS mutant ARID1A exhibits greater stability than nuclear-localized, wild-type ARID1A. Identification of this undescribed functional NLS within ARID1A contributes vital insights to rationalize the impact of ARID1A missense mutations observed in patient tumors. - Highlights: • We have identified a bipartite nuclear localization sequence (NLS) in ARID1A. • Confirmation of the NLS was performed using GFP constructs. • NLS mutant ARID1A exhibits greater stability than wild-type ARID1A.

  20. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    International Nuclear Information System (INIS)

    Arneodo, Alain; Vaillant, Cedric; Audit, Benjamin; Argoul, Francoise; D'Aubenton-Carafa, Yves; Thermes, Claude

    2011-01-01

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  1. Novel and deviant Walker A ATP-binding motifs in bacteriophage large terminase-DNA packaging proteins

    International Nuclear Information System (INIS)

    Mitchell, Michael S.; Rao, Venigalla B.

    2004-01-01

    Bacteriophage terminases constitute a very interesting class of viral-coded multifunctional ATPase 'motors' that apparently drive directional translocation of DNA into an empty viral capsid. A common Walker A motif and other conserved signatures of a critical ATPase catalytic center are identified in the N-terminal half of numerous large terminase proteins. However, several terminases, including the well-characterized λ and SPP1 terminases, seem to lack the classic Walker A in the N-terminus. Using sequence alignment approaches, we discovered the presence of deviant Walker A motifs in these and many other phage terminases. One deviation, the presence of a lysine at the beginning of P-loop, may represent a 3D equivalent of the universally conserved lysine in the Walker A GKT/S signature. This and other novel putative Walker A motifs that first came to light through this study help define the ATPase centers of phage and viral terminases as well as elicit important insights into the molecular functioning of this fundamental motif in biological systems

  2. On algorithmic equivalence of instruction sequences for computing bit string functions

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2015-01-01

    Every partial function from bit strings of a given length to bit strings of a possibly different given length can be computed by a finite instruction sequence that contains only instructions to set and get the content of Boolean registers, forward jump instructions, and a termination instruction. We

  3. On algorithmic equivalence of instruction sequences for computing bit string functions

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2014-01-01

    Every partial function from bit strings of a given length to bit strings of a possibly different given length can be computed by a finite instruction sequence that contains only instructions to set and get the content of Boolean registers, forward jump instructions, and a termination instruction. We

  4. The ARTT motif and a unified structural understanding of substraterecognition in ADP ribosylating bacterial toxins and eukaryotic ADPribosyltransferases

    Energy Technology Data Exchange (ETDEWEB)

    Han, S.; Tainer, J.A.

    2001-08-01

    ADP-ribosylation is a widely occurring and biologically critical covalent chemical modification process in pathogenic mechanisms, intracellular signaling systems, DNA repair, and cell division. The reaction is catalyzed by ADP-ribosyltransferases, which transfer the ADP-ribose moiety of NAD to a target protein with nicotinamide release. A family of bacterial toxins and eukaryotic enzymes has been termed the mono-ADP-ribosyltransferases, in distinction to the poly-ADP-ribosyltransferases, which catalyze the addition of multiple ADP-ribose groups to the carboxyl terminus of eukaryotic nucleoproteins. Despite the limited primary sequence homology among the different ADP-ribosyltransferases, a central cleft bearing NAD-binding pocket formed by the two perpendicular b-sheet core has been remarkably conserved between bacterial toxins and eukaryotic mono- and poly-ADP-ribosyltransferases. The majority of bacterial toxins and eukaryotic mono-ADP-ribosyltransferases are characterized by conserved His and catalytic Glu residues. In contrast, Diphtheria toxin, Pseudomonas exotoxin A, and eukaryotic poly-ADP-ribosyltransferases are characterized by conserved Arg and catalytic Glu residues. The NAD-binding core of a binary toxin and a C3-like toxin family identified an ARTT motif (ADP-ribosylating turn-turn motif) that is implicated in substrate specificity and recognition by structural and mutagenic studies. Here we apply structure-based sequence alignment and comparative structural analyses of all known structures of ADP-ribosyltransfeases to suggest that this ARTT motif is functionally important in many ADP-ribosylating enzymes that bear a NAD binding cleft as characterized by conserved Arg and catalytic Glu residues. Overall, structure-based sequence analysis reveals common core structures and conserved active sites of ADP-ribosyltransferases to support similar NAD binding mechanisms but differing mechanisms of target protein binding via sequence variations within the ARTT

  5. Recurrence Relations and Generating Functions of the Sequence of Sums of Corresponding Factorials and Triangular Numbers

    Directory of Open Access Journals (Sweden)

    Romer C. Castillo

    2015-11-01

    Full Text Available This study established some recurrence relations and exponential generating functions of the sequence of factoriangular numbers. A factoriangular number is defined as a sum of corresponding factorial and triangular number. The proofs utilize algebraic manipulations with some known results from calculus, particularly on power series and Maclaurin’s series. The recurrence relations were found by manipulating the formula defining a factoringular number while the ascertained exponential generating functions were in the closed form.

  6. The human Ago2 MC region does not contain an eIF4E-like mRNA cap binding motif

    Directory of Open Access Journals (Sweden)

    Grishin Nick V

    2009-01-01

    Full Text Available Abstract Background Argonaute (Ago proteins interact with small regulatory RNAs to mediate gene regulatory pathways. A recent report by Kiriakidou et al. 1 describes an MC sequence region identified in Ago2 that displays similarity to the cap-binding motif in translation initiation factor 4E (eIF4E. In a cap-bound eIF4E structure, two important aromatic residues of the motif stack on either side of a 7-methylguanosine 5'-triphosphate (m7Gppp base. The corresponding Ago2 aromatic residues (F450 and F505 were hypothesized to perform the same cap-binding function. However, the detected similarity between the MC sequence and the eIF4E cap-binding motif was questionable. Results A number of sequence-based and structure-based bioinformatics methods reveal the reported similarity between the Ago2 MC sequence region and the eIF4E cap-binding motif to be spurious. Alternatively, the MC sequence region is confidently assigned to the N-terminus of the Ago piwi module, within the mid domain of experimentally determined prokaryotic Ago structures. Confident mapping of the Ago2 MC sequence region to the piwi mid domain results in a homology-based structure model that positions the identified aromatic residues over 20 Å apart, with one of the aromatic side chains (F450 contributing instead to the hydrophobic core of the domain. Conclusion Correct functional prediction based on weak sequence similarity requires substantial evolutionary and structural support. The evolutionary context of the Ago mid domain suggested by multiple sequence alignment is limited to a conserved hydrophobicity profile required for the fold and a motif following the MC region that binds guide RNA. Mapping of the MC sequence to the mid domain structure reveals Ago2 aromatics that are incompatible with eIF4E-like mRNA cap-binding, yet display some limited local structure similarities that cause the chance sequence match to eIF4E. Reviewers This article was reviewed by Arcady Mushegian

  7. Powdery mildew fungal effector candidates share N-terminal Y/F/WxC-motif

    Directory of Open Access Journals (Sweden)

    Emmersen Jeppe

    2010-05-01

    Full Text Available Abstract Background Powdery mildew and rust fungi are widespread, serious pathogens that depend on developing haustoria in the living plant cells. Haustoria are separated from the host cytoplasm by a plant cell-derived extrahaustorial membrane. They secrete effector proteins, some of which are subsequently transferred across this membrane to the plant cell to suppress defense. Results In a cDNA library from barley epidermis containing powdery mildew haustoria, two-thirds of the sequenced ESTs were fungal and represented ~3,000 genes. Many of the most highly expressed genes encoded small proteins with N-terminal signal peptides. While these proteins are novel and poorly related, they do share a three-amino acid motif, which we named "Y/F/WxC", in the N-terminal of the mature proteins. The first amino acid of this motif is aromatic: tyrosine, phenylalanine or tryptophan, and the last is always cysteine. In total, we identified 107 such proteins, for which the ESTs represent 19% of the fungal clones in our library, suggesting fundamental roles in haustoria function. While overall sequence similarity between the powdery mildew Y/F/WxC-proteins is low, they do have a highly similar exon-intron structure, suggesting they have a common origin. Interestingly, searches of public fungal genome and EST databases revealed that haustoria-producing rust fungi also encode large numbers of novel, short proteins with signal peptides and the Y/F/WxC-motif. No significant numbers of such proteins were identified from genome and EST sequences from either fungi which do not produce haustoria or from haustoria-producing Oomycetes. Conclusion In total, we identified 107, 178 and 57 such Y/F/WxC-proteins from the barley powdery mildew, the wheat stem rust and the wheat leaf rust fungi, respectively. All together, our findings suggest the Y/F/WxC-proteins to be a new class of effectors from haustoria-producing pathogenic fungi.

  8. MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

    Science.gov (United States)

    Ozaki, Haruka; Iwasaki, Wataru

    2016-08-01

    As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Characterizing Motif Dynamics of Electric Brain Activity Using Symbolic Analysis

    Directory of Open Access Journals (Sweden)

    Massimiliano Zanin

    2014-10-01

    Full Text Available Motifs are small recurring circuits of interactions which constitute the backbone of networked systems. Characterizing motif dynamics is therefore key to understanding the functioning of such systems. Here we propose a method to define and quantify the temporal variability and time scales of electroencephalogram (EEG motifs of resting brain activity. Given a triplet of EEG sensors, links between them are calculated by means of linear correlation; each pattern of links (i.e., each motif is then associated to a symbol, and its appearance frequency is analyzed by means of Shannon entropy. Our results show that each motif becomes observable with different coupling thresholds and evolves at its own time scale, with fronto-temporal sensors emerging at high thresholds and changing at fast time scales, and parietal ones at low thresholds and changing at slower rates. Finally, while motif dynamics differed across individuals, for each subject, it showed robustness across experimental conditions, indicating that it could represent an individual dynamical signature.

  10. Functional role of a highly repetitive DNA sequence in anchorage of the mouse genome.

    Science.gov (United States)

    Neuer-Nitsche, B; Lu, X N; Werner, D

    1988-09-12

    The major portion of the eukaryotic genome consists of various categories of repetitive DNA sequences which have been studied with respect to their base compositions, organizations, copy numbers, transcription and species specificities; their biological roles, however, are still unclear. A novel quality of a highly repetitive mouse DNA sequence is described which points to a functional role: All copies (approximately 50,000 per haploid genome) of this DNA sequence reside on genomic Alu I DNA fragments each associated with nuclear polypeptides that are not released from DNA by proteinase K, SDS and phenol extraction. By this quality the repetitive DNA sequence is classified as a member of the sub-set of DNA sequences involved in tight DNA-polypeptide complexes which have been previously shown to be components of the subnuclear structure termed 'nuclear matrix'. From these results it has to be concluded that the repetitive DNA sequence characterized in this report represents or comprises a signal for a large number of site specific attachment points of the mouse genome in the nuclear matrix.

  11. The Sequences of 1504 Mutants in the Model Rice Variety Kitaake Facilitate Rapid Functional Genomic Studies.

    Science.gov (United States)

    Li, Guotian; Jain, Rashmi; Chern, Mawsheng; Pham, Nikki T; Martin, Joel A; Wei, Tong; Schackwitz, Wendy S; Lipzen, Anna M; Duong, Phat Q; Jones, Kyle C; Jiang, Liangrong; Ruan, Deling; Bauer, Diane; Peng, Yi; Barry, Kerrie W; Schmutz, Jeremy; Ronald, Pamela C

    2017-06-01

    The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake ( Oryza sativa ssp japonica ), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. This work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations. © 2017 American Society of Plant Biologists. All rights reserved.

  12. Structural and Functional Models of Non-Heme Iron Enzymes : A Study of the 2-His-1-Carboxylate Facial Triad Structural Motif

    NARCIS (Netherlands)

    Bruijnincx, P.C.A.

    2007-01-01

    The structural and functional modeling of a specific group of non-heme iron enzymes by the synthesis of small synthetic analogues is the topic of this thesis. The group of non-heme iron enzymes with the 2-His-1-carboxylate facial triad has recently been established as a common platform for the

  13. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    Science.gov (United States)

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  14. Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievals

    Directory of Open Access Journals (Sweden)

    Reneker Jeff

    2005-05-01

    Full Text Available Abstract Background Searching for small tandem/disperse repetitive DNA sequences streamlines many biomedical research processes. For instance, whole genomic array analysis in yeast has revealed 22 PHO-regulated genes. The promoter regions of all but one of them contain at least one of the two core Pho4p binding sites, CACGTG and CACGTT. In humans, microsatellites play a role in a number of rare neurodegenerative diseases such as spinocerebellar ataxia type 1 (SCA1. SCA1 is a hereditary neurodegenerative disease caused by an expanded CAG repeat in the coding sequence of the gene. In bacterial pathogens, microsatellites are proposed to regulate expression of some virulence factors. For example, bacteria commonly generate intra-strain diversity through phase variation which is strongly associated with virulence determinants. A recent analysis of the complete sequences of the Helicobacter pylori strains 26695 and J99 has identified 46 putative phase-variable genes among the two genomes through their association with homopolymeric tracts and dinucleotide repeats. Life scientists are increasingly interested in studying the function of small sequences of DNA. However, current search algorithms often generate thousands of matches – most of which are irrelevant to the researcher. Results We present our hash function as well as our search algorithm to locate small sequences of DNA within multiple genomes. Our system applies information retrieval algorithms to discover knowledge of cross-species conservation of repeat sequences. We discuss our incorporation of the Gene Ontology (GO database into these algorithms. We conduct an exhaustive time analysis of our system for various repetitive sequence lengths. For instance, a search for eight bases of sequence within 3.224 GBases on 49 different chromosomes takes 1.147 seconds on average. To illustrate the relevance of the search results, we conduct a search with and without added annotation terms for the

  15. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-01-01

    The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  16. The arginine of the DRY motif in transmembrane segment III functions as a balancing micro-switch in the activation of the β2-adrenergic receptor

    DEFF Research Database (Denmark)

    Hansen, Louise Valentin; Groenen, Marleen; Nygaard, Rie

    2012-01-01

    (s) signaling, arrestin mobilization, and internalization upon alanine substitutions. Conversely, TyrV:24 appears to play a role in stabilizing the active receptor conformation as loss of function of G(s) signaling, arrestin mobilization, and receptor internalization was observed upon alanine substitution......VI:-06 (Glu6.30) in the inactive conformation nor the interaction with TyrV:24 (Tyr5.58) in the active conformation were observed in the x-ray structures. Here we find through molecular dynamics simulations, after removal of the stabilizing T4 lysozyme, that the expected salt bridge between ArgIII:26...... and GluVI:-06 does form relatively easily in the inactive receptor conformation. Moreover, mutational analysis of GluVI:-06 in TM-VI and the neighboring AspIII:25 in TM-III demonstrated that these two residues do function as locks for the inactive receptor conformation as we observed increased G...

  17. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships.

    KAUST Repository

    Kosinski, Jan

    2013-02-08

    SUMMARY: MODexplorer is an integrated tool aimed at exploring the sequence, structural and functional diversity in protein families useful in homology modeling and in analyzing protein families in general. It takes as input either the sequence or the structure of a protein and provides alignments with its homologs along with a variety of structural and functional annotations through an interactive interface. The annotations include sequence conservation, similarity scores, ligand-, DNA- and RNA-binding sites, secondary structure, disorder, crystallographic structure resolution and quality scores of models implied by the alignments to the homologs of known structure. MODexplorer can be used to analyze sequence and structural conservation among the structures of similar proteins, to find structures of homologs solved in different conformational state or with different ligands and to transfer functional annotations. Furthermore, if the structure of the query is not known, MODexplorer can be used to select the modeling templates taking all this information into account and to build a comparative model. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modexplorer. Website implemented in HTML and JavaScript with all major browsers supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  18. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships.

    KAUST Repository

    Kosinski, Jan; Barbato, Alessandro; Tramontano, Anna

    2013-01-01

    SUMMARY: MODexplorer is an integrated tool aimed at exploring the sequence, structural and functional diversity in protein families useful in homology modeling and in analyzing protein families in general. It takes as input either the sequence or the structure of a protein and provides alignments with its homologs along with a variety of structural and functional annotations through an interactive interface. The annotations include sequence conservation, similarity scores, ligand-, DNA- and RNA-binding sites, secondary structure, disorder, crystallographic structure resolution and quality scores of models implied by the alignments to the homologs of known structure. MODexplorer can be used to analyze sequence and structural conservation among the structures of similar proteins, to find structures of homologs solved in different conformational state or with different ligands and to transfer functional annotations. Furthermore, if the structure of the query is not known, MODexplorer can be used to select the modeling templates taking all this information into account and to build a comparative model. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modexplorer. Website implemented in HTML and JavaScript with all major browsers supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  19. The N-terminal leucine-zipper motif in PTRF/cavin-1 is essential and sufficient for its caveolae-association

    Energy Technology Data Exchange (ETDEWEB)

    Wei, Zhuang [State Key Laboratory of Cell Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031 (China); Laboratory of System Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031 (China); Zou, Xinle [State Key Laboratory of Cell Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031 (China); Wang, Hongzhong; Lei, Jigang; Wu, Yuan [State Key Laboratory of Cell Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031 (China); Laboratory of System Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031 (China); Liao, Kan, E-mail: kliao@sibs.ac.cn [State Key Laboratory of Cell Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031 (China); Laboratory of System Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031 (China)

    2015-01-16

    Highlight: • The N-terminal leucine-zipper motif in PTRF/cavin-1 determines caveolar association. • Different cellular localization of PTRF/cavin-1 influences its serine 389 and 391 phosphorylation state. • PTRF/cavin-1 regulates cell motility via its caveolar association. - Abstract: PTRF/cavin-1 is a protein of two lives. Its reported functions in ribosomal RNA synthesis and in caveolae formation happen in two different cellular locations: nucleus vs. plasma membrane. Here, we identified that the N-terminal leucine-zipper motif in PTRF/cavin-1 was essential for the protein to be associated with caveolae in plasma membrane. It could counteract the effect of nuclear localization sequence in the molecule (AA 235–251). Deletion of this leucine-zipper motif from PTRF/cavin-1 caused the mutant to be exclusively localized in nuclei. The fusion of this leucine-zipper motif with histone 2A, which is a nuclear protein, could induce the fusion protein to be exported from nucleus. Cell migration was greatly inhibited in PTRF/cavin-1{sup −/−} mouse embryonic fibroblasts (MEFs). The inhibited cell motility could only be rescued by exogenous cavin-1 but not the leucine-zipper motif deleted cavin-1 mutant. Plasma membrane dynamics is an important factor in cell motility control. Our results suggested that the membrane dynamics in cell migration is affected by caveolae associated PTRF/cavin-1.

  20. The N-terminal leucine-zipper motif in PTRF/cavin-1 is essential and sufficient for its caveolae-association

    International Nuclear Information System (INIS)

    Wei, Zhuang; Zou, Xinle; Wang, Hongzhong; Lei, Jigang; Wu, Yuan; Liao, Kan

    2015-01-01

    Highlight: • The N-terminal leucine-zipper motif in PTRF/cavin-1 determines caveolar association. • Different cellular localization of PTRF/cavin-1 influences its serine 389 and 391 phosphorylation state. • PTRF/cavin-1 regulates cell motility via its caveolar association. - Abstract: PTRF/cavin-1 is a protein of two lives. Its reported functions in ribosomal RNA synthesis and in caveolae formation happen in two different cellular locations: nucleus vs. plasma membrane. Here, we identified that the N-terminal leucine-zipper motif in PTRF/cavin-1 was essential for the protein to be associated with caveolae in plasma membrane. It could counteract the effect of nuclear localization sequence in the molecule (AA 235–251). Deletion of this leucine-zipper motif from PTRF/cavin-1 caused the mutant to be exclusively localized in nuclei. The fusion of this leucine-zipper motif with histone 2A, which is a nuclear protein, could induce the fusion protein to be exported from nucleus. Cell migration was greatly inhibited in PTRF/cavin-1 −/− mouse embryonic fibroblasts (MEFs). The inhibited cell motility could only be rescued by exogenous cavin-1 but not the leucine-zipper motif deleted cavin-1 mutant. Plasma membrane dynamics is an important factor in cell motility control. Our results suggested that the membrane dynamics in cell migration is affected by caveolae associated PTRF/cavin-1

  1. SiteBinder: an improved approach for comparing multiple protein structural motifs.

    Science.gov (United States)

    Sehnal, David; Vařeková, Radka Svobodová; Huber, Heinrich J; Geidl, Stanislav; Ionescu, Crina-Maria; Wimmerová, Michaela; Koča, Jaroslav

    2012-02-27

    There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.

  2. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    Science.gov (United States)

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  3. A functional U-statistic method for association analysis of sequencing data.

    Science.gov (United States)

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  4. From Sequence and Forces to Structure, Function and Evolution of Intrinsically Disordered Proteins

    Science.gov (United States)

    Forman-Kay, Julie D.; Mittag, Tanja

    2015-01-01

    Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales and compactness is shaping a unified understanding of structure-dynamics-disorder/function relationships. On the 20th anniversary of this journal, Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional and evolutionary properties. PMID:24010708

  5. fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets.

    Science.gov (United States)

    Madrigal, Pedro

    2017-03-01

    Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers. An R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/ . pmb59@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  6. Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.

    Science.gov (United States)

    Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D

    2004-10-01

    Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and

  7. A proposed vestigial translation initiation motif in VP1 of hepatitis A virus.

    Science.gov (United States)

    Kang, Jeong-Ah; Funkhouser, Ann W

    2002-07-01

    The internal ribosome entry site (IRES) of picornaviruses has a 3' polypyrimidine tract (PPT) 16-24 bases upstream of an AUG triplet (PPT/AUG motif). This motif is critical in determining the efficiency of cap-independent translation. HAV has a conserved PPT/AUG motif consisting of a nine base sequence (AGGUUUUUC) 23 bases upstream of the preferred AUG start codon. This HAV-specific PPT/AUG motif is repeated and conserved in VP1 of HAV, but not of other picornaviruses. We proposed that the PPT/AUG motif in the open reading frame initiated translation and/or had an impact on the life cycle of the virus. In vitro translation of mutant bicistronic mRNAs and growth in cell culture of mutant viruses provided no evidence that the VP1 PPT/AUG motif had any impact on either translation or growth. HAV differs from other picornaviruses in its inefficient growth in cell culture. Since the HAV-specific PPT/AUG motif is found in only 1 in 300,000 reported viral sequences outside the hepatovirus genus, this motif may be a vestigial translation initiation element and may have played a role in determining the unusual phenotype of HAV.

  8. Transduction motif analysis of gastric cancer based on a human signaling network

    Energy Technology Data Exchange (ETDEWEB)

    Liu, G.; Li, D.Z.; Jiang, C.S.; Wang, W. [Fuzhou General Hospital of Nanjing Command, Department of Gastroenterology, Fuzhou, China, Department of Gastroenterology, Fuzhou General Hospital of Nanjing Command, Fuzhou (China)

    2014-04-04

    To investigate signal regulation models of gastric cancer, databases and literature were used to construct the signaling network in humans. Topological characteristics of the network were analyzed by CytoScape. After marking gastric cancer-related genes extracted from the CancerResource, GeneRIF, and COSMIC databases, the FANMOD software was used for the mining of gastric cancer-related motifs in a network with three vertices. The significant motif difference method was adopted to identify significantly different motifs in the normal and cancer states. Finally, we conducted a series of analyses of the significantly different motifs, including gene ontology, function annotation of genes, and model classification. A human signaling network was constructed, with 1643 nodes and 5089 regulating interactions. The network was configured to have the characteristics of other biological networks. There were 57,942 motifs marked with gastric cancer-related genes out of a total of 69,492 motifs, and 264 motifs were selected as significantly different motifs by calculating the significant motif difference (SMD) scores. Genes in significantly different motifs were mainly enriched in functions associated with cancer genesis, such as regulation of cell death, amino acid phosphorylation of proteins, and intracellular signaling cascades. The top five significantly different motifs were mainly cascade and positive feedback types. Almost all genes in the five motifs were cancer related, including EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, TGFBR2, AR, and CASP7. The development of cancer might be curbed by inhibiting signal transductions upstream and downstream of the selected motifs.

  9. Effective Feature Selection for Classification of Promoter Sequences.

    Directory of Open Access Journals (Sweden)

    Kouser K

    Full Text Available Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine, KNN (K Nearest Neighbor and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  10. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

    Science.gov (United States)

    Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

    2015-02-10

    Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.

  11. Bidirectional gene sequences with similar homology to functional proteins of alkane degrading bacterium pseudomonas fredriksbergensis DNA

    International Nuclear Information System (INIS)

    Megeed, A.A.

    2011-01-01

    The potential for two overlapping fragments of DNA from a clone of newly isolated alkanes degrading bacterium Pseudomonas frederiksbergensis encoding sequences with similar homology to two parts of functional proteins is described. One strand contains a sequence with high homology to alkanes monooxygenase (alkB), a member of the alkanes hydroxylase family, and the other strand contains a sequence with some homology to alcohol dehydrogenase gene (alkJ). Overlapping of the genes on opposite strands has been reported in eukaryotic species, and is now reported in a bacterial species. The sequence comparisons and ORFS results revealed that the regulation and the genes organization involved in alkane oxidation represented in Pseudomonas frederiksberghensis varies among the different known alkane degrading bacteria. The alk gene cluster containing homologues to the known alkane monooxygenase (alkB), and rubredoxin (alkG) are oriented in the same direction, whereas alcohol dehydrogenase (alkJ) is oriented in the opposite direction. Such genomes encode messages on both strands of the DNA, or in an overlapping but different reading frames, of the same strand of DNA. The possibility of creating novel genes from pre-existing sequences, known as overprinting, which is a widespread phenomenon in small viruses. Here, the origin and evolution of the gene overlap to bacteriophages belonging to the family Microviridae have been investigated. Such a phenomenon is most widely described in extremely small genomes such as those of viruses or small plasmids, yet here is a unique phenomenon. (author)

  12. A novel Drosophila model of TDP-43 proteinopathies: N-terminal sequences combined with the Q/N domain induce protein functional loss and locomotion defects

    Directory of Open Access Journals (Sweden)

    Simona Langellotti

    2016-06-01

    Full Text Available Transactive response DNA-binding protein 43 kDa (TDP-43, also known as TBPH in Drosophila melanogaster and TARDBP in mammals is the main protein component of the pathological inclusions observed in neurons of patients affected by different neurodegenerative disorders, including amyotrophic lateral sclerosis (ALS and fronto-temporal lobar degeneration (FTLD. The number of studies investigating the molecular mechanisms underlying neurodegeneration is constantly growing; however, the role played by TDP-43 in disease onset and progression is still unclear. A fundamental shortcoming that hampers progress is the lack of animal models showing aggregation of TDP-43 without overexpression. In this manuscript, we have extended our cellular model of aggregation to a transgenic Drosophila line. Our fly model is not based on the overexpression of a wild-type TDP-43 transgene. By contrast, we engineered a construct that includes only the specific TDP-43 amino acid sequences necessary to trigger aggregate formation and capable of trapping endogenous Drosophila TDP-43 into a non-functional insoluble form. Importantly, the resulting recombinant product lacks functional RNA recognition motifs (RRMs and, thus, does not have specific TDP-43-physiological functions (i.e. splicing regulation ability that might affect the animal phenotype per se. This novel Drosophila model exhibits an evident degenerative phenotype with reduced lifespan and early locomotion defects. Additionally, we show that important proteins involved in neuromuscular junction function, such as syntaxin (SYX, decrease their levels as a consequence of TDP-43 loss of function implying that the degenerative phenotype is a consequence of TDP-43 sequestration into the aggregates. Our data lend further support to the role of TDP-43 loss-of-function in the pathogenesis of neurodegenerative disorders. The novel transgenic Drosophila model presented in this study will help to gain further insight into the

  13. Multiple amino acid sequence alignment nitrogenase component 1: insights into phylogenetics and structure-function relationships.

    Directory of Open Access Journals (Sweden)

    James B Howard

    Full Text Available Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as "core" for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification

  14. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  15. Structural motifs of pre-nucleation clusters.

    Science.gov (United States)

    Zhang, Y; Türkmen, I R; Wassermann, B; Erko, A; Rühl, E

    2013-10-07

    Structural motifs of pre-nucleation clusters prepared in single, optically levitated supersaturated aqueous aerosol microparticles containing CaBr2 as a model system are reported. Cluster formation is identified by means of X-ray absorption in the Br K-edge regime. The salt concentration beyond the saturation point is varied by controlling the humidity in the ambient atmosphere surrounding the 15-30 μm microdroplets. This leads to the formation of metastable supersaturated liquid particles. Distinct spectral shifts in near-edge spectra as a function of salt concentration are observed, in which the energy position of the Br K-edge is red-shifted by up to 7.1 ± 0.4 eV if the dilute solution is compared to the solid. The K-edge positions of supersaturated solutions are found between these limits. The changes in electronic structure are rationalized in terms of the formation of pre-nucleation clusters. This assumption is verified by spectral simulations using first-principle density functional theory and molecular dynamics calculations, in which structural motifs are considered, explaining the experimental results. These consist of solvated CaBr2 moieties, rather than building blocks forming calcium bromide hexahydrates, the crystal system that is formed by drying aqueous CaBr2 solutions.

  16. Novel Strategy for Discrimination of Transcription Factor Binding Motifs Employing Mathematical Neural Network

    Science.gov (United States)

    Sugimoto, Asuka; Sumi, Takuya; Kang, Jiyoung; Tateno, Masaru

    2017-07-01

    Recognition in biological macromolecular systems, such as DNA-protein recognition, is one of the most crucial problems to solve toward understanding the fundamental mechanisms of various biological processes. Since specific base sequences of genome DNA are discriminated by proteins, such as transcription factors (TFs), finding TF binding motifs (TFBMs) in whole genome DNA sequences is currently a central issue in interdisciplinary biophysical and information sciences. In the present study, a novel strategy to create a discriminant function for discrimination of TFBMs by constituting mathematical neural networks (NNs) is proposed, together with a method to determine the boundary of signals (TFBMs) and noise in the NN-score (output) space. This analysis also leads to the mathematical limitation of discrimination in the recognition of features representing TFBMs, in an information geometrical manifold. Thus, the present strategy enables the identification of the whole space of TFBMs, right up to the noise boundary.

  17. Structure-based design synthesis of functionalized 3-(5-(s-phenyl)-4H-pyrazol-3-yl)-2H-chromen-2-one motifs and indigenous plant extracts and their antimalarial potential

    Science.gov (United States)

    Olayinka, Ajani; Grace, Olasehinde; Titilope, Dokunmu; Ruth, Diji-Geske; Olabode, Onileere; John, Openibo; Oreoluwa, Oluseye; Tochukwu, Chileke; Ezekiel, Adebiyi

    2018-04-01

    Resistance of the malaria parasite to conventional therapeutic agents calls for increased efforts in antimalarial drug discovery. Current efforts should be targeted at developing safe and affordable new agents to counter the spread of malaria parasites that are resistant to existing therapy. In this study, toxicological and in vivo antiplasmodial properties of 3-(5-(s-phenyl)-4H-pyrazol-3-yl)-42H-chromen-2, Mangifera indica and Tithonia diversifolia in swiss albino mice models, Musmusculus were investigated. 2H-Chromen-2-one also known as coumarin is highly privileged oxygen-containing heterocyclic entity which are present in plant kingdom as secondary metabolites. The maceration technique of crude drug extraction was employed using cold water extraction. Toxicological analysis was carried out using Lorke's method for acute toxicity testing while the chemosuppressive activity was carried out using Peter's four day test on early infection. We also report the synthesis of functionalized 3-(5-(s-phenyl)-4H-pyrazol-3-yl)-2H-chromen-2-one motifs via microwave assisted synthetic approach and isolation of indigenous plant extract in order to investigate their antimalarial efficacy. The condensation reaction of 3-acetylcoumarin with various benzaldehyde derivatives resulted in the formation of 3-[3-acryloyl]-2H-chromen-2-one which was subsequently reaction the hydrazine hydrate via microwave assisted hydrazinolysis to afford the targeted 3-(5-(s-phenyl)-4H-pyrazol-3-yl)-2H-chromen-2-one motifs. The chemical structures were confirmed by analytical data and spectroscopic means such as FT-IR, UV, 1H NMR, 13C NMR and DEPT-135. The microwave assisted reaction was remarkably successful and gave targeted 3-(5-(s-phenyl)-4H-pyrazol-3-yl)-2H-chromen-2-one motifs in higher yields at lesser reaction time compared to conventional heating method. The LD50 of the aqueous extracts of the leaves and stem bark Mangifera indica was established to be ± 707.11 mg/kg b.w., p.o. (body weight

  18. Mapping genomic features to functional traits through microbial whole genome sequences.

    Science.gov (United States)

    Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott

    2014-01-01

    Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.

  19. Conserved amino acid motifs from the novel Piv/MooV family of transposases and site-specific recombinases are required for catalysis of DNA inversion by Piv.

    Science.gov (United States)

    Tobiason, D M; Buchner, J M; Thiel, W H; Gernert, K M; Karls, A C

    2001-02-01

    Piv, a site-specific invertase from Moraxella lacunata, exhibits amino acid homology with the transposases of the IS110/IS492 family of insertion elements. The functions of conserved amino acid motifs that define this novel family of both transposases and site-specific recombinases (Piv/MooV family) were examined by mutagenesis of fully conserved amino acids within each motif in Piv. All Piv mutants altered in conserved residues were defective for in vivo inversion of the M. lacunata invertible DNA segment, but competent for in vivo binding to Piv DNA recognition sequences. Although the primary amino acid sequences of the Piv/MooV recombinases do not contain a conserved DDE motif, which defines the retroviral integrase/transposase (IN/Tnps) family, the predicted secondary structural elements of Piv align well with those of the IN/Tnps for which crystal structures have been determined. Molecular modelling of Piv based on these alignments predicts that E59, conserved as either E or D in the Piv/MooV family, forms a catalytic pocket with the conserved D9 and D101 residues. Analysis of Piv E59G confirms a role for E59 in catalysis of inversion. These results suggest that Piv and the related IS110/IS492 transposases mediate DNA recombination by a common mechanism involving a catalytic DED or DDD motif.

  20. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    Science.gov (United States)

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  1. Positive Selection or Free to Vary? Assessing the Functional Significance of Sequence Change Using Molecular Dynamics.

    Directory of Open Access Journals (Sweden)

    Jane R Allison

    Full Text Available Evolutionary arms races between pathogens and their hosts may be manifested as selection for rapid evolutionary change of key genes, and are sometimes detectable through sequence-level analyses. In the case of protein-coding genes, such analyses frequently predict that specific codons are under positive selection. However, detecting positive selection can be non-trivial, and false positive predictions are a common concern in such analyses. It is therefore helpful to place such predictions within a structural and functional context. Here, we focus on the p19 protein from tombusviruses. P19 is a homodimer that sequesters siRNAs, thereby preventing the host RNAi machinery from shutting down viral infection. Sequence analysis of the p19 gene is complicated by the fact that it is constrained at the sequence level by overprinting of a viral movement protein gene. Using homology modeling, in silico mutation and molecular dynamics simulations, we assess how non-synonymous changes to two residues involved in forming the dimer interface-one invariant, and one predicted to be under positive selection-impact molecular function. Interestingly, we find that both observed variation and potential variation (where a non-synonymous change to p19 would be synonymous for the overprinted movement protein does not significantly impact protein structure or RNA binding. Consequently, while several methods identify residues at the dimer interface as being under positive selection, MD results suggest they are functionally indistinguishable from a site that is free to vary. Our analyses serve as a caveat to using sequence-level analyses in isolation to detect and assess positive selection, and emphasize the importance of also accounting for how non-synonymous changes impact structure and function.

  2. In Silico Characterization of Pectate Lyase Protein Sequences from Different Source Organisms

    Directory of Open Access Journals (Sweden)

    Amit Kumar Dubey

    2010-01-01

    Full Text Available A total of 121 protein sequences of pectate lyases were subjected to homology search, multiple sequence alignment, phylogenetic tree construction, and motif analysis. The phylogenetic tree constructed revealed different clusters based on different source organisms representing bacterial, fungal, plant, and nematode pectate lyases. The multiple accessions of bacterial, fungal, nematode, and plant pectate lyase protein sequences were placed closely revealing a sequence level similarity. The multiple sequence alignment of these pectate lyase protein sequences from different source organisms showed conserved regions at different stretches with maximum homology from amino acid residues 439–467, 715–816, and 829–910 which could be used for designing degenerate primers or probes specific for pectate lyases. The motif analysis revealed a conserved Pec_Lyase_C domain uniformly observed in all pectate lyases irrespective of variable sources suggesting its possible role in structural and enzymatic functions.

  3. Intercellular signalling in Vibrio harveyi: sequence and function of genes regulating expression of luminescence.

    Science.gov (United States)

    Bassler, B L; Wright, M; Showalter, R E; Silverman, M R

    1993-08-01

    Density-dependent expression of luminescence in Vibrio harveyi is regulated by the concentration of an extracellular signal molecule (autoinducer) in the culture medium. A recombinant clone that restored function to one class of spontaneous dim mutants was found to encode functions necessary for the synthesis of, and response to, a signal molecule. Sequence analysis of the region encoding these functions revealed three open reading frames, two (luxL and luxM) that are required for production of an autoinducer substance and a third (luxN) that is required for response to this signal substance. The LuxL and LuxM proteins are not similar in amino acid sequence to other proteins in the database, but the LuxN protein contains regions of sequence resembling both the histidine protein kinase and the response regulator domains of the family of two-component, signal transduction proteins. The phenotypes of mutants with luxL, luxM and luxN defects indicated that an additional signal-response system controlling density-dependent expression of luminescence remains to be identified.

  4. A Parvovirus B19 synthetic genome: sequence features and functional competence.

    Science.gov (United States)

    Manaresi, Elisabetta; Conti, Ilaria; Bua, Gloria; Bonvicini, Francesca; Gallinella, Giorgio

    2017-08-01

    Central to genetic studies for Parvovirus B19 (B19V) is the availability of genomic clones that may possess functional competence and ability to generate infectious virus. In our study, we established a new model genetic system for Parvovirus B19. A synthetic approach was followed, by design of a reference genome sequence, by generation of a corresponding artificial construct and its molecular cloning in a complete and functional form, and by setup of an efficient strategy to generate infectious virus, via transfection in UT7/EpoS1 cells and amplification in erythroid progenitor cells. The synthetic genome was able to generate virus with biological properties paralleling those of native virus, its infectious activity being dependent on the preservation of self-complementarity and sequence heterogeneity within the terminal regions. A virus of defined genome sequence, obtained from controlled cell culture conditions, can constitute a reference tool for investigation of the structural and functional characteristics of the virus. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Analysis of breast cancer metastasis candidate genes from next generation-sequencing via systematic functional genomics

    DEFF Research Database (Denmark)

    Blomstrøm, Monica Marie

    2016-01-01

    several growth modulators and invasion modulators were identified and independently validated. These candidates revealed a group of genes with metastasis-related functions in vitro that are involved in RNA-related processes, such as RNA-processing. Moreover, a general feature was that proliferation......) and non-CSCs. The main goal of this project was to functionally characterize a set of candidate genes recovered from next-generation sequencing analysis for their role in breast cancer metastasis formation. The starting gene set comprised 104 gene variants; i.e. 57 wildtype and 47 mutated variants. During...

  6. Identification of functional SNPs in the 5-prime flanking sequences of human genes

    Directory of Open Access Journals (Sweden)

    Lenhard Boris

    2005-02-01

    Full Text Available Abstract Background Over 4 million single nucleotide polymorphisms (SNPs are currently reported to exist within the human genome. Only a small fraction of these SNPs alter gene function or expression, and therefore might be associated with a cell phenotype. These functional SNPs are consequently important in understanding human health. Information related to functional SNPs in candidate disease genes is critical for cost effective genetic association studies, which attempt to understand the genetics of complex diseases like diabetes, Alzheimer's, etc. Robust methods for the identification of functional SNPs are therefore crucial. We report one such experimental approach. Results Sequence conserved between mouse and human genomes, within 5 kilobases of the 5-prime end of 176 GPCR genes, were screened for SNPs. Sequences flanking these SNPs were scored for transcription factor binding sites. Allelic pairs resulting in a significant score difference were predicted to influence the binding of transcription factors (TFs. Ten such SNPs were selected for mobility shift assays (EMSA, resulting in 7 of them exhibiting a reproducible shift. The full-length promoter regions with 4 of the 7 SNPs were cloned in a Luciferase based plasmid reporter system. Two out of the 4 SNPs exhibited differential promoter activity in several human cell lines. Conclusions We propose a method for effective selection of functional, regulatory SNPs that are located in evolutionary conserved 5-prime flanking regions (5'-FR regions of human genes and influence the activity of the transcriptional regulatory region. Some SNPs behave differently in different cell types.

  7. Kopi dan Kakao dalam Kreasi Motif Batik Khas Jember

    Directory of Open Access Journals (Sweden)

    Irfa'ina Rohana Salma

    2015-06-01

    Full Text Available ABSTRAK Batik Jember selama ini identik dengan motif daun tembakau. Visualisasi daun tembakau dalam motif Batik Jember cukup lemah, yaitu kurang berkarakter karena motif yang muncul adalah seperti gambar daun pada umumnya. Oleh karena itu perlu diciptakan desain motif batik khas Jember yang sumber inspirasinya digali dari kekayaan alam lainnya dari Jember yang mempunyai bentuk spesifik dan karakteristik sehingga identitas motif bisa didapatkan dengan lebih kuat. Hasil alam khas Jember tersebut adalah kopi dan kakao. Tujuan penciptaan seni ini adalah untuk menghasilkan motif batik  baru yang mempunyai ciri khas Jember. Metode yang digunakan yaitu pengumpulan data, pengamatan mendalam terhadap objek penciptaan, pengkajian sumber inspirasi, pembuatan desain motif, dan perwujudan menjadi batik. Dari penciptaan seni ini berhasil dikreasikan 6 (enam motif batik yaitu: (1 Motif Uwoh Kopi; (2 Motif Godong Kopi;  (3 Motif Ceplok Kakao; (4 Motif Kakao Raja; (5 Motif Kakao Biru; dan (6 Motif Wiji Mukti. Berdasarkan hasil penilaian “Selera Estetika” diketahui bahwa motif yang paling banyak disukai adalah Motif Uwoh Kopi dan Motif Kakao Raja. Kata kunci: Motif Woh Kopi, Motif Godong Kopi, Motif Ceplok Kakao, Motif Kakao Raja, Motif Kakao Biru, Motif Wiji Mukti ABSTRACTBatik Jember is synonymous with tobacco leaf motif. Tobacco leaf shape is quite weak in the visual appearance characterized as that motif emerges like a picture of leaves in general. Therefore, it is necessary to create a distinctive design motif extracted from other natural resources of Jember that have specific shapes and characteristics that can be obtained as the stronger motif identity. The typical natural resources from Jember are coffee and cocoa. The purpose of the creation of this art is to produce the unique, creative and innovative batik and have specific characteristics of Jember. The method used are data collection, observation of the object, reviewing inspiration sources

  8. Efficient sequential and parallel algorithms for planted motif search.

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2014-01-31

    Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

  9. The combinatorial PP1-binding consensus Motif (R/Kx( (0,1V/IxFxx(R/Kx(R/K is a new apoptotic signature.

    Directory of Open Access Journals (Sweden)

    Angélique N Godet

    Full Text Available BACKGROUND: Previous studies established that PP1 is a target for Bcl-2 proteins and an important regulator of apoptosis. The two distinct functional PP1 consensus docking motifs, R/Kx((0,1V/IxF and FxxR/KxR/K, involved in PP1 binding and cell death were previously characterized in the BH1 and BH3 domains of some Bcl-2 proteins. PRINCIPAL FINDINGS: In this study, we demonstrate that DPT-AIF(1, a peptide containing the AIF(562-571 sequence located in a c-terminal domain of AIF, is a new PP1 interacting and cell penetrating molecule. We also showed that DPT-AIF(1 provoked apoptosis in several human cell lines. Furthermore, DPT-APAF(1 a bi-partite cell penetrating peptide containing APAF-1(122-131, a non penetrating sequence from APAF-1 protein, linked to our previously described DPT-sh1 peptide shuttle, is also a PP1-interacting death molecule. Both AIF(562-571 and APAF-1(122-131 sequences contain a common R/Kx((0,1V/IxFxxR/KxR/K motif, shared by several proteins involved in control of cell survival pathways. This motif combines the two distinct PP1c consensus docking motifs initially identified in some Bcl-2 proteins. Interestingly DPT-AIF(2 and DPT-APAF(2 that carry a F to A mutation within this combinatorial motif, no longer exhibited any PP1c binding or apoptotic effects. Moreover the F to A mutation in DPT-AIF(2 also suppressed cell penetration. CONCLUSION: These results indicate that the combinatorial PP1c docking motif R/Kx((0,1V/IxFxxR/KxR/K, deduced from AIF(562-571 and APAF-1(122-131 sequences, is a new PP1c-dependent Apoptotic Signature. This motif is also a new tool for drug design that could be used to characterize potential anti-tumour molecules.

  10. An artificial functional family filter in homolog searching in next-generation sequencing metagenomics.

    Directory of Open Access Journals (Sweden)

    Ruofei Du

    Full Text Available In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.

  11. Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

    Science.gov (United States)

    Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

    2006-01-01

    Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344

  12. Functional brain activation differences in stuttering identified with a rapid fMRI sequence

    Science.gov (United States)

    Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.

    2011-01-01

    The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech motor and auditory brain activity in children who stutter closer to the age at which recovery from stuttering is documented. Rapid sequences may be preferred for individuals or populations who do not tolerate long scanning sessions. In this report, we document the application of a picture naming and phoneme monitoring task in three minute fMRI sequences with adults who stutter (AWS). If relevant brain differences are found in AWS with these approaches that conform to previous reports, then these approaches can be extended to younger populations. Pairwise contrasts of brain BOLD activity between AWS and normally fluent adults indicated the AWS showed higher BOLD activity in the right inferior frontal gyrus (IFG), right temporal lobe and sensorimotor cortices during picture naming and and higher activity in the right IFG during phoneme monitoring. The right lateralized pattern of BOLD activity together with higher activity in sensorimotor cortices is consistent with previous reports, which indicates rapid fMRI sequences can be considered for investigating stuttering in younger participants. PMID:22133409

  13. Regulation of PCNA Function by Tyrosine Phosphorylation in Prostate Cancer

    Science.gov (United States)

    2012-10-01

    ylated wild-type sequence did not bind to any of the functional domains. In contrast, incubation with the phosphorylated peptide identified the SH2 domain...Recently, He et al. reported that c-Abl interacted with PCNA through a putative PCNA-binding motif in the SH2 domain of c- Abl [22]. This proposed motif...motif of c-Abl may play a role in anti-apoptosis, interaction between Abl/ SH2 with PCNA/phospho-Y211 can confer a signaling for growth advantage in

  14. A functional test of Neandertal and modern human mitochondrial targeting sequences

    Energy Technology Data Exchange (ETDEWEB)

    Gralle, Matthias, E-mail: gralle@bioqmed.ufrj.br [Instituto de Bioquimica Medica, Universidade Federal do Rio de Janeiro, CCS, Ilha do Fundao, 21941-590 Rio de Janeiro (Brazil); Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig (Germany); Schaefer, Ingo; Seibel, Peter [Department of Molecular Cell Therapy, Leipzig University, Deutscher Platz 5, 04103 Leipzig (Germany); Paeaebo, Svante [Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig (Germany)

    2010-11-26

    Research highlights: {yields} Two mutations in mitochondrial targeting peptides occurred during human evolution, possibly after Neandertals split off from modern human lineage. {yields} The ancestral and modern human versions of these two targeting peptides were tested functionally for their effects on localization and cleavage rate. {yields} In spite of recent evolution, and to the contrary of other mutations in targeting peptides, these mutations had no visible effects. -- Abstract: Targeting of nuclear-encoded proteins to different organelles, such as mitochondria, is a process that can result in the redeployment of proteins to new intracellular destinations during evolution. With the sequencing of the Neandertal genome, it has become possible to identify amino acid substitutions that occurred on the modern human lineage since its separation from the Neandertal lineage. Here we analyze the function of two substitutions in mitochondrial targeting sequences that occurred and rose to high frequency recently during recent human evolution. The ancestral and modern versions of the two targeting sequences do not differ in the efficiency with which they direct a protein to the mitochondria, an observation compatible with the neutral theory of molecular evolution.

  15. A functional test of Neandertal and modern human mitochondrial targeting sequences

    International Nuclear Information System (INIS)

    Gralle, Matthias; Schaefer, Ingo; Seibel, Peter; Paeaebo, Svante

    2010-01-01

    Research highlights: → Two mutations in mitochondrial targeting peptides occurred during human evolution, possibly after Neandertals split off from modern human lineage. → The ancestral and modern human versions of these two targeting peptides were tested functionally for their effects on localization and cleavage rate. → In spite of recent evolution, and to the contrary of other mutations in targeting peptides, these mutations had no visible effects. -- Abstract: Targeting of nuclear-encoded proteins to different organelles, such as mitochondria, is a process that can result in the redeployment of proteins to new intracellular destinations during evolution. With the sequencing of the Neandertal genome, it has become possible to identify amino acid substitutions that occurred on the modern human lineage since its separation from the Neandertal lineage. Here we analyze the function of two substitutions in mitochondrial targeting sequences that occurred and rose to high frequency recently during recent human evolution. The ancestral and modern versions of the two targeting sequences do not differ in the efficiency with which they direct a protein to the mitochondria, an observation compatible with the neutral theory of molecular evolution.

  16. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    Science.gov (United States)

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  17. SSFSE sequence functional MRI of the human cervical spinal cord with complex finger tapping

    International Nuclear Information System (INIS)

    Xie Chuhai; Kong Kangmei; Guan Jitian; Chen Yexi; He Jiankang; Qi Weili; Wang Xinjia; Shen Zhiwei; Wu Renhua

    2009-01-01

    Purpose: Functional MR imaging of the human cervical spinal cord was carried out on volunteers during alternated rest and a complex finger tapping task, in order to detect image intensity changes arising from neuronal activity. Methods: Functional MR imaging data using single-shot fast spin-echo sequence (SSFSE) with echo time 42.4 ms on a 1.5 T GE Clinical System were acquired in eight subjects performing a complex finger tapping task. Cervical spinal cord activation was measured both in the sagittal and transverse imaging planes. Postprocessing was performed by AFNI (Analysis of Functional Neuroimages) software system. Results: Intensity changes (5.5-7.6%) were correlated with the time course of stimulation and were consistently detected in both sagittal and transverse imaging planes of the cervical spinal cord. The activated regions localized to the ipsilateral side of the spinal cord in agreement with the neural anatomy. Conclusion: Functional MR imaging signals can be reliably detected with finger tapping activity in the human cervical spinal cord using a SSFSE sequence with 42.4 ms echo time. The anatomic location of neural activity correlates with the muscles used in the finger tapping task.

  18. How pathogens use linear motifs to perturb host cell networks

    KAUST Repository

    Via, Allegra; Uyar, Bora; Brun, Christine; Zanzoni, Andreas

    2015-01-01

    Molecular mimicry is one of the powerful stratagems that pathogens employ to colonise their hosts and take advantage of host cell functions to guarantee their replication and dissemination. In particular, several viruses have evolved the ability to interact with host cell components through protein short linear motifs (SLiMs) that mimic host SLiMs, thus facilitating their internalisation and the manipulation of a wide range of cellular networks. Here we present convincing evidence from the literature that motif mimicry also represents an effective, widespread hijacking strategy in prokaryotic and eukaryotic parasites. Further insights into host motif mimicry would be of great help in the elucidation of the molecular mechanisms behind host cell invasion and the development of anti-infective therapeutic strategies.

  19. Genetic analysis of beta1 integrin "activation motifs" in mice

    DEFF Research Database (Denmark)

    Czuchra, Aleksandra; Meyer, Hannelore; Legate, Kyle R

    2006-01-01

    -null phenotype in vivo. Surprisingly, neither the substitution of the tyrosines with phenylalanine nor the aspartic acid with alanine resulted in an obvious defect. These data suggest that the NPXY motifs of the beta1 integrin tail are essential for beta1 integrin function, whereas tyrosine phosphorylation...

  20. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

    Science.gov (United States)

    Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E

    2016-12-01

    Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. © 2016 The Protein Society.

  1. Generation, analysis and functional annotation of expressed sequence tags from the ectoparasitic mite Psoroptes ovis

    Directory of Open Access Journals (Sweden)

    Kenyon Fiona

    2011-07-01

    Full Text Available Abstract Background Sheep scab is caused by Psoroptes ovis and is arguably the most important ectoparasitic disease affecting sheep in the UK. The disease is highly contagious and causes and considerable pruritis and irritation and is therefore a major welfare concern. Current methods of treatment are unsustainable and in order to elucidate novel methods of disease control a more comprehensive understanding of the parasite is required. To date, no full genomic DNA sequence or large scale transcript datasets are available and prior to this study only 484 P. ovis expressed sequence tags (ESTs were accessible in public databases. Results In order to further expand upon the transcriptomic coverage of P. ovis thus facilitating novel insights into the mite biology we undertook a larger scale EST approach, incorporating newly generated and previously described P. ovis transcript data and representing the largest collection of P. ovis ESTs to date. We sequenced 1,574 ESTs and assembled these along with 484 previously generated P. ovis ESTs, which resulted in the identification of 1,545 unique P. ovis sequences. BLASTX searches identified 961 ESTs with significant hits (E-value P. ovis ESTs. Gene Ontology (GO analysis allowed the functional annotation of 880 ESTs and included predictions of signal peptide and transmembrane domains; allowing the identification of potential P. ovis excreted/secreted factors, and mapping of metabolic pathways. Conclusions This dataset currently represents the largest collection of P. ovis ESTs, all of which are publicly available in the GenBank EST database (dbEST (accession numbers FR748230 - FR749648. Functional analysis of this dataset identified important homologues, including house dust mite allergens and tick salivary factors. These findings offer new insights into the underlying biology of P. ovis, facilitating further investigations into mite biology and the identification of novel methods of intervention.

  2. Gene Isolation Using Degenerate Primers Targeting Protein Motif: A Laboratory Exercise

    Science.gov (United States)

    Yeo, Brandon Pei Hui; Foong, Lian Chee; Tam, Sheh May; Lee, Vivian; Hwang, Siaw San

    2018-01-01

    Structures and functions of protein motifs are widely included in many biology-based course syllabi. However, little emphasis is placed to link this knowledge to applications in biotechnology to enhance the learning experience. Here, the conserved motifs of nucleotide binding site-leucine rich repeats (NBS-LRR) proteins, successfully used for the…

  3. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2010-09-01

    Full Text Available Abstract Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS" but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq not to be biological transcription factor binding sites ("empirical TFBS". We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation.

  4. Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.

    Science.gov (United States)

    Michael, Sushama; Travé, Gilles; Ramu, Chenna; Chica, Claudia; Gibson, Toby J

    2008-02-15

    KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database-implying that KEN-boxes might be more common than reported. Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.

  5. The HIVToolbox 2 web system integrates sequence, structure, function and mutation analysis.

    Directory of Open Access Journals (Sweden)

    David P Sargeant

    Full Text Available There is enormous interest in studying HIV pathogenesis for improving the treatment of patients with HIV infection. HIV infection has become one of the best-studied systems for understanding how a virus can hijack a cell. To help facilitate discovery, we previously built HIVToolbox, a web system for visual data mining. The original HIVToolbox integrated information for HIV protein sequence, structure, functional sites, and sequence conservation. This web system has been used for almost 40,000 searches. We report improvements to HIVToolbox including new functions and workflows, data updates, and updates for ease of use. HIVToolbox2, is an improvement over HIVToolbox with new functions. HIVToolbox2 has new functionalities focused on HIV pathogenesis including drug-binding sites, drug-resistance mutations, and immune epitopes. The integrated, interactive view enables visual mining to generate hypotheses that are not readily revealed by other approaches. Most HIV proteins form multimers, and there are posttranslational modification and protein-protein interaction sites at many of these multimerization interfaces. Analysis of protease drug binding sites reveals an anatomy of drug resistance with different types of drug-resistance mutations regionally localized on the surface of protease. Some of these drug-resistance mutations have a high prevalence in specific HIV-1 M subtypes. Finally, consolidation of Tat functional sites reveals a hotspot region where there appear to be 30 interactions or posttranslational modifications. A cursory analysis with HIVToolbox2 has helped to identify several global patterns for HIV proteins. An initial analysis with this tool identifies homomultimerization of almost all HIV proteins, functional sites that overlap with multimerization sites, a global drug resistance anatomy for HIV protease, and specific distributions of some DRMs in specific HIV M subtypes. HIVToolbox2 is an open-access web application available at

  6. Identification and characterization of a selenoprotein family containing a diselenide bond in a redox motif

    Science.gov (United States)

    Shchedrina, Valentina A.; Novoselov, Sergey V.; Malinouski, Mikalai Yu.; Gladyshev, Vadim N.

    2007-01-01

    Selenocysteine (Sec, U) insertion into proteins is directed by translational recoding of specific UGA codons located upstream of a stem-loop structure known as Sec insertion sequence (SECIS) element. Selenoproteins with known functions are oxidoreductases containing a single redox-active Sec in their active sites. In this work, we identified a family of selenoproteins, designated SelL, containing two Sec separated by two other residues to form a UxxU motif. SelL proteins show an unusual occurrence, being present in diverse aquatic organisms, including fish, invertebrates, and marine bacteria. Both eukaryotic and bacterial SelL genes use single SECIS elements for insertion of two Sec. In eukaryotes, the SECIS is located in the 3′ UTR, whereas the bacterial SelL SECIS is within a coding region and positioned at a distance that supports the insertion of either of the two Sec or both of these residues. SelL proteins possess a thioredoxin-like fold wherein the UxxU motif corresponds to the catalytic CxxC motif in thioredoxins, suggesting a redox function of SelL proteins. Distantly related SelL-like proteins were also identified in a variety of organisms that had either one or both Sec replaced with Cys. Danio rerio SelL, transiently expressed in mammalian cells, incorporated two Sec and localized to the cytosol. In these cells, it occurred in an oxidized form and was not reducible by DTT. In a bacterial expression system, we directly demonstrated the formation of a diselenide bond between the two Sec, establishing it as the first diselenide bond found in a natural protein. PMID:17715293

  7. Cytochromes P450 for natural product biosynthesis in Streptomyces: sequence, structure, and function.

    Science.gov (United States)

    Rudolf, Jeffrey D; Chang, Chin-Yuan; Ma, Ming; Shen, Ben

    2017-08-30

    Covering: up to January 2017Cytochrome P450 enzymes (P450s) are some of the most exquisite and versatile biocatalysts found in nature. In addition to their well-known roles in steroid biosynthesis and drug metabolism in humans, P450s are key players in natural product biosynthetic pathways. Natural products, the most chemically and structurally diverse small molecules known, require an extensive collection of P450s to accept and functionalize their unique scaffolds. In this review, we survey the current catalytic landscape of P450s within the Streptomyces genus, one of the most prolific producers of natural products, and comprehensively summarize the functionally characterized P450s from Streptomyces. A sequence similarity network of >8500 P450s revealed insights into the sequence-function relationships of these oxygen-dependent metalloenzymes. Although only ∼2.4% and structurally characterized, respectively, the study of streptomycete P450s involved in the biosynthesis of natural products has revealed their diverse roles in nature, expanded their catalytic repertoire, created structural and mechanistic paradigms, and exposed their potential for biomedical and biotechnological applications. Continued study of these remarkable enzymes will undoubtedly expose their true complement of chemical and biological capabilities.

  8. Versatile Gene-Specific Sequence Tags for Arabidopsis Functional Genomics: Transcript Profiling and Reverse Genetics Applications

    Science.gov (United States)

    Hilson, Pierre; Allemeersch, Joke; Altmann, Thomas; Aubourg, Sébastien; Avon, Alexandra; Beynon, Jim; Bhalerao, Rishikesh P.; Bitton, Frédérique; Caboche, Michel; Cannoot, Bernard; Chardakov, Vasil; Cognet-Holliger, Cécile; Colot, Vincent; Crowe, Mark; Darimont, Caroline; Durinck, Steffen; Eickhoff, Holger; de Longevialle, Andéol Falcon; Farmer, Edward E.; Grant, Murray; Kuiper, Martin T.R.; Lehrach, Hans; Léon, Céline; Leyva, Antonio; Lundeberg, Joakim; Lurin, Claire; Moreau, Yves; Nietfeld, Wilfried; Paz-Ares, Javier; Reymond, Philippe; Rouzé, Pierre; Sandberg, Goran; Segura, Maria Dolores; Serizet, Carine; Tabrett, Alexandra; Taconnat, Ludivine; Thareau, Vincent; Van Hummelen, Paul; Vercruysse, Steven; Vuylsteke, Marnik; Weingartner, Magdalena; Weisbeek, Peter J.; Wirta, Valtteri; Wittink, Floyd R.A.; Zabeau, Marc; Small, Ian

    2004-01-01

    Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics. PMID:15489341

  9. Low-dimensional morphospace of topological motifs in human fMRI brain networks

    Directory of Open Access Journals (Sweden)

    Sarah E. Morgan

    2018-06-01

    Full Text Available We present a low-dimensional morphospace of fMRI brain networks, where axes are defined in a data-driven manner based on the network motifs. The morphospace allows us to identify the key variations in healthy fMRI networks in terms of their underlying motifs, and we observe that two principal components (PCs can account for 97% of the motif variability. The first PC of the motif distribution is correlated with efficiency and inversely correlated with transitivity. Hence this axis approximately conforms to the well-known economical small-world trade-off between integration and segregation in brain networks. Finally, we show that the economical clustering generative model proposed by Vértes et al. (2012 can approximately reproduce the motif morphospace of the real fMRI brain networks, in contrast to other generative models. Overall, the motif morphospace provides a powerful way to visualize the relationships between network properties and to investigate generative or constraining factors in the formation of complex human brain functional networks. Motifs have been described as the building blocks of complex networks. Meanwhile, a morphospace allows networks to be placed in a common space and can reveal the relationships between different network properties and elucidate the driving forces behind network topology. We combine the concepts of motifs and morphospaces to create the first motif morphospace of fMRI brain networks. Crucially, the morphospace axes are defined by the motifs, in a data-driven manner. We observe strong correlations between the networks’ positions in morphospace and their global topological properties, suggesting that motif morphospaces are a powerful way to capture the topology of networks in a low-dimensional space and to compare generative models of brain networks. Motif morphospaces could also be used to study other complex networks’ topologies.

  10. Motifs in triadic random graphs based on Steiner triple systems

    Science.gov (United States)

    Winkler, Marco; Reichardt, Jörg

    2013-08-01

    Conventionally, pairwise relationships between nodes are considered to be the fundamental building blocks of complex networks. However, over the last decade, the overabundance of certain subnetwork patterns, i.e., the so-called motifs, has attracted much attention. It has been hypothesized that these motifs, instead of links, serve as the building blocks of network structures. Although the relation between a network's topology and the general properties of the system, such as its function, its robustness against perturbations, or its efficiency in spreading information, is the central theme of network science, there is still a lack of sound generative models needed for testing the functional role of subgraph motifs. Our work aims to overcome this limitation. We employ the framework of exponential random graph models (ERGMs) to define models based on triadic substructures. The fact that only a small portion of triads can actually be set independently poses a challenge for the formulation of such models. To overcome this obstacle, we use Steiner triple systems (STSs). These are partitions of sets of nodes into pair-disjoint triads, which thus can be specified independently. Combining the concepts of ERGMs and STSs, we suggest generative models capable of generating ensembles of networks with nontrivial triadic Z-score profiles. Further, we discover inevitable correlations between the abundance of triad patterns, which occur solely for statistical reasons and need to be taken into account when discussing the functional implications of motif statistics. Moreover, we calculate the degree distributions of our triadic random graphs analytically.

  11. Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network

    Directory of Open Access Journals (Sweden)

    Barabási Albert-László

    2004-01-01

    Full Text Available Abstract Background Transcriptional regulation of cellular functions is carried out through a complex network of interactions among transcription factors and the promoter regions of genes and operons regulated by them.To better understand the system-level function of such networks simplification of their architecture was previously achieved by identifying the motifs present in the network, which are small, overrepresented, topologically distinct regulatory interaction patterns (subgraphs. However, the interaction of such motifs with each other, and their form of integration into the full network has not been previously examined. Results By studying the transcriptional regulatory network of the bacterium, Escherichia coli, we demonstrate that the two previously identified motif types in the network (i.e., feed-forward loops and bi-fan motifs do not exist in isolation, but rather aggregate into homologous motif clusters that largely overlap with known biological functions. Moreover, these clusters further coalesce into a supercluster, thus establishing distinct topological hierarchies that show global statistical properties similar to the whole network. Targeted removal of motif links disintegrates the network into small, isolated clusters, while random disruptions of equal number of links do not cause such an effect. Conclusion Individual motifs aggregate into homologous motif clusters and a supercluster forming the backbone of the E. coli transcriptional regulatory network and play a central role in defining its global topological organization.

  12. Fitness for synchronization of network motifs

    DEFF Research Database (Denmark)

    Vega, Y.M.; Vázquez-Prada, M.; Pacheco, A.F.

    2004-01-01

    We study the synchronization of Kuramoto's oscillators in small parts of networks known as motifs. We first report on the system dynamics for the case of a scale-free network and show the existence of a non-trivial critical point. We compute the probability that network motifs synchronize, and fi...... that the fitness for synchronization correlates well with motifs interconnectedness and structural complexity. Possible implications for present debates about network evolution in biological and other systems are discussed....

  13. I-motif DNA structures are formed in the nuclei of human cells

    Science.gov (United States)

    Zeraati, Mahdi; Langley, David B.; Schofield, Peter; Moye, Aaron L.; Rouet, Romain; Hughes, William E.; Bryan, Tracy M.; Dinger, Marcel E.; Christ, Daniel

    2018-06-01

    Human genome function is underpinned by the primary storage of genetic information in canonical B-form DNA, with a second layer of DNA structure providing regulatory control. I-motif structures are thought to form in cytosine-rich regions of the genome and to have regulatory functions; however, in vivo evidence for the existence of such structures has so far remained elusive. Here we report the generation and characterization of an antibody fragment (iMab) that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i-motifs in the nuclei of human cells. We demonstrate that the in vivo formation of such structures is cell-cycle and pH dependent. Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions. Our results support the notion that i-motif structures provide key regulatory roles in the genome.

  14. Hidden Markov event sequence models: toward unsupervised functional MRI brain mapping.

    Science.gov (United States)

    Faisan, Sylvain; Thoraval, Laurent; Armspach, Jean-Paul; Foucher, Jack R; Metz-Lutz, Marie-Noëlle; Heitz, Fabrice

    2005-01-01

    Most methods used in functional MRI (fMRI) brain mapping require restrictive assumptions about the shape and timing of the fMRI signal in activated voxels. Consequently, fMRI data may be partially and misleadingly characterized, leading to suboptimal or invalid inference. To limit these assumptions and to capture the broad range of possible activation patterns, a novel statistical fMRI brain mapping method is proposed. It relies on hidden semi-Markov event sequence models (HSMESMs), a special class of hidden Markov models (HMMs) dedicated to the modeling and analysis of event-based random processes. Activation detection is formulated in terms of time coupling between (1) the observed sequence of hemodynamic response onset (HRO) events detected in the voxel's fMRI signal and (2) the "hidden" sequence of task-induced neural activation onset (NAO) events underlying the HROs. Both event sequences are modeled within a single HSMESM. The resulting brain activation model is trained to automatically detect neural activity embedded in the input fMRI data set under analysis. The data sets considered in this article are threefold: synthetic epoch-related, real epoch-related (auditory lexical processing task), and real event-related (oddball detection task) fMRI data sets. Synthetic data: Activation detection results demonstrate the superiority of the HSMESM mapping method with respect to a standard implementation of the statistical parametric mapping (SPM) approach. They are also very close, sometimes equivalent, to those obtained with an "ideal" implementation of SPM in which the activation patterns synthesized are reused for analysis. The HSMESM method appears clearly insensitive to timing variations of the hemodynamic response and exhibits low sensitivity to fluctuations of its shape (unsustained activation during task). Real epoch-related data: HSMESM activation detection results compete with those obtained with SPM, without requiring any prior definition of the expected

  15. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  16. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae

    Directory of Open Access Journals (Sweden)

    Christian J. Michel

    2017-12-01

    , represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.

  17. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

    Science.gov (United States)

    Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

    2017-12-03

    evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.

  18. Spectral Velocity Estimation using the Autocorrelation Function and Sparse data Sequences

    DEFF Research Database (Denmark)

    Jensen, Jørgen Arendt

    2005-01-01

    Ultrasound scanners can be used for displaying the distribution of velocities in blood vessels by finding the power spectrum of the received signal. It is desired to show a B-mode image for orientation and data for this has to be acquired interleaved with the flow data. Techniques for maintaining...... both the B-mode frame rate, and at the same time have the highest possible $f_{prf}$ only limited by the depth of investigation, are, thus, of great interest. The power spectrum can be calculated from the Fourier transform of the autocorrelation function $R_r(k)$. The lag $k$ corresponds...... of the sequence. The audio signal has also been synthesized from the autocorrelation data by passing white, Gaussian noise through a filter designed from the power spectrum of the autocorrelation function. The results show that both the full velocity range can be maintained at the same time as a B-mode image...

  19. The myoglobin of Emperor penguin (Aptenodytes forsteri): amino acid sequence and functional adaptation to extreme conditions.

    Science.gov (United States)

    Tamburrini, M; Romano, M; Giardina, B; di Prisco, G

    1999-02-01

    In the framework of a study on molecular adaptations of the oxygen-transport and storage systems to extreme conditions in Antarctic marine organisms, we have investigated the structure/function relationship in Emperor penguin (Aptenodytes forsteri) myoglobin, in search of correlation with the bird life style. In contrast with previous reports, the revised amino acid sequence contains one additional residue and 15 differences. The oxygen-binding parameters seem well adapted to the diving behaviour of the penguin and to the environmental conditions of the Antarctic habitat. Addition of lactate has no major effect on myoglobin oxygenation over a large temperature range. Therefore, metabolic acidosis does not impair myoglobin function under conditions of prolonged physical effort, such as diving.

  20. Designing sequence to control protein function in an EF-hand protein.

    Science.gov (United States)

    Bunick, Christopher G; Nelson, Melanie R; Mangahas, Sheryll; Hunter, Michael J; Sheehan, Jonathan H; Mizoue, Laura S; Bunick, Gerard J; Chazin, Walter J

    2004-05-19

    The extent of conformational change that calcium binding induces in EF-hand proteins is a key biochemical property specifying Ca(2+) sensor versus signal modulator function. To understand how differences in amino acid sequence lead to differences in the response to Ca(2+) binding, comparative analyses of sequence and structures, combined with model building, were used to develop hypotheses about which amino acid residues control Ca(2+)-induced conformational changes. These results were used to generate a first design of calbindomodulin (CBM-1), a calbindin D(9k) re-engineered with 15 mutations to respond to Ca(2+) binding with a conformational change similar to that of calmodulin. The gene for CBM-1 was synthesized, and the protein was expressed and purified. Remarkably, this protein did not exhibit any non-native-like molten globule properties despite the large number of mutations and the nonconservative nature of some of them. Ca(2+)-induced changes in CD intensity and in the binding of the hydrophobic probe, ANS, implied that CBM-1 does undergo Ca(2+) sensorlike conformational changes. The X-ray crystal structure of Ca(2+)-CBM-1 determined at 1.44 A resolution reveals the anticipated increase in hydrophobic surface area relative to the wild-type protein. A nascent calmodulin-like hydrophobic docking surface was also found, though it is occluded by the inter-EF-hand loop. The results from this first calbindomodulin design are discussed in terms of progress toward understanding the relationships between amino acid sequence, protein structure, and protein function for EF-hand CaBPs, as well as the additional mutations for the next CBM design.

  1. Prominence vs. aboutness in sequencing: a functional distinction within the left inferior frontal gyrus.

    Science.gov (United States)

    Bornkessel-Schlesewsky, Ina; Grewe, Tanja; Schlesewsky, Matthias

    2012-02-01

    Prior research on the neural bases of syntactic comprehension suggests that activation in the left inferior frontal gyrus (lIFG) correlates with the processing of word order variations. However, there are inconsistencies with respect to the specific subregion within the IFG that is implicated by these findings: the pars opercularis or the pars triangularis. Here, we examined the hypothesis that the dissociation between pars opercularis and pars triangularis activation may reflect functional differences between clause-medial and clause-initial word order permutations, respectively. To this end, we directly compared clause-medial and clause-initial object-before-subject orders in German in a within-participants, event-related fMRI design. Our results showed increased activation for object-initial sentences in a bilateral network of frontal, temporal and subcortical regions. Within the lIFG, posterior and inferior subregions showed only a main effect of word order, whereas more anterior and superior subregions showed effects of word order and sentence type, with higher activation for sentences with an argument in the clause-initial position. These findings are interpreted as evidence for a functional gradation of sequence processing within the left IFG: posterior subportions correlate with argument prominence-based (local) aspects of sequencing, while anterior subportions correlate with aboutness-based aspects of sequencing, which are crucial in linking the current sentence to the wider discourse. This proposal appears compatible with more general hypotheses about information processing gradients in prefrontal cortex (Koechlin & Summerfield, 2007). Copyright © 2010 Elsevier Inc. All rights reserved.

  2. A functional analysis of the spacer of V(DJ recombination signal sequences.

    Directory of Open Access Journals (Sweden)

    Alfred Ian Lee

    2003-10-01

    Full Text Available During lymphocyte development, V(DJ recombination assembles antigen receptor genes from component V, D, and J gene segments. These gene segments are flanked by a recombination signal sequence (RSS, which serves as the binding site for the recombination machinery. The murine Jbeta2.6 gene segment is a recombinationally inactive pseudogene, but examination of its RSS reveals no obvious reason for its failure to recombine. Mutagenesis of the Jbeta2.6 RSS demonstrates that the sequences of the heptamer, nonamer, and spacer are all important. Strikingly, changes solely in the spacer sequence can result in dramatic differences in the level of recombination. The subsequent analysis of a library of more than 4,000 spacer variants revealed that spacer residues of particular functional importance are correlated with their degree of conservation. Biochemical assays indicate distinct cooperation between the spacer and heptamer/nonamer along each step of the reaction pathway. The results suggest that the spacer serves not only to ensure the appropriate distance between the heptamer and nonamer but also regulates RSS activity by providing additional RAG:RSS interaction surfaces. We conclude that while RSSs are defined by a "digital" requirement for absolutely conserved nucleotides, the quality of RSS function is determined in an "analog" manner by numerous complex interactions between the RAG proteins and the less-well conserved nucleotides in the heptamer, the nonamer, and, importantly, the spacer. Those modulatory effects are accurately predicted by a new computational algorithm for "RSS information content." The interplay between such binary and multiplicative modes of interactions provides a general model for analyzing protein-DNA interactions in various biological systems.

  3. A functional analysis of the spacer of V(D)J recombination signal sequences.

    Science.gov (United States)

    Lee, Alfred Ian; Fugmann, Sebastian D; Cowell, Lindsay G; Ptaszek, Leon M; Kelsoe, Garnett; Schatz, David G

    2003-10-01

    During lymphocyte development, V(D)J recombination assembles antigen receptor genes from component V, D, and J gene segments. These gene segments are flanked by a recombination signal sequence (RSS), which serves as the binding site for the recombination machinery. The murine Jbeta2.6 gene segment is a recombinationally inactive pseudogene, but examination of its RSS reveals no obvious reason for its failure to recombine. Mutagenesis of the Jbeta2.6 RSS demonstrates that the sequences of the heptamer, nonamer, and spacer are all important. Strikingly, changes solely in the spacer sequence can result in dramatic differences in the level of recombination. The subsequent analysis of a library of more than 4,000 spacer variants revealed that spacer residues of particular functional importance are correlated with their degree of conservation. Biochemical assays indicate distinct cooperation between the spacer and heptamer/nonamer along each step of the reaction pathway. The results suggest that the spacer serves not only to ensure the appropriate distance between the heptamer and nonamer but also regulates RSS activity by providing additional RAG:RSS interaction surfaces. We conclude that while RSSs are defined by a "digital" requirement for absolutely conserved nucleotides, the quality of RSS function is determined in an "analog" manner by numerous complex interactions between the RAG proteins and the less-well conserved nucleotides in the heptamer, the nonamer, and, importantly, the spacer. Those modulatory effects are accurately predicted by a new computational algorithm for "RSS information content." The interplay between such binary and multiplicative modes of interactions provides a general model for analyzing protein-DNA interactions in various biological systems.

  4. Targeted Sequencing of Lung Function Loci in Chronic Obstructive Pulmonary Disease Cases and Controls.

    Directory of Open Access Journals (Sweden)

    María Soler Artigas

    Full Text Available Chronic obstructive pulmonary disease (COPD is the third leading cause of death worldwide; smoking is the main risk factor for COPD, but genetic factors are also relevant contributors. Genome-wide association studies (GWAS of the lung function measures used in the diagnosis of COPD have identified a number of loci, however association signals are often broad and collectively these loci only explain a small proportion of the heritability. In order to examine the association with COPD risk of genetic variants down to low allele frequencies, to aid fine-mapping of association signals and to explain more of the missing heritability, we undertook a targeted sequencing study in 300 COPD cases and 300 smoking controls for 26 loci previously reported to be associated with lung function. We used a pooled sequencing approach, with 12 pools of 25 individuals each, enabling high depth (30x coverage per sample to be achieved. This pooled design maximised sample size and therefore power, but led to challenges during variant-calling since sequencing error rates and minor allele frequencies for rare variants can be very similar. For this reason we employed a rigorous quality control pipeline for variant detection which included the use of 3 independent calling algorithms. In order to avoid false positive associations we also developed tests to detect variants with potential batch effects and removed them before undertaking association testing. We tested for the effects of single variants and the combined effect of rare variants within a locus. We followed up the top signals with data available (only 67% of collapsing methods signals in 4,249 COPD cases and 11,916 smoking controls from UK Biobank. We provide suggestive evidence for the combined effect of rare variants on COPD risk in TNXB and in sliding windows within MECOM and upstream of HHIP. These findings can lead to an improved understanding of the molecular pathways involved in the development of COPD.

  5. Sequence and function of LuxO, a negative regulator of luminescence in Vibrio harveyi.

    Science.gov (United States)

    Bassler, B L; Wright, M; Silverman, M R

    1994-05-01

    Density-dependent expression of luminescence in Vibrio harveyi is regulated by the concentration of extracellular signal molecules (autoinducers) in the culture medium. A recombinant clone that restored function to one class of spontaneous dim mutants was found to encode a function required for the density-dependent response. Transposon Tn5 insertions in the recombinant clone were isolated, and the mutations were transferred to the genome of V. harveyi for examination of mutant phenotypes. Expression of luminescence in V. harveyi strains with transposon insertions in one locus, luxO, was independent of the density of the culture and was similar in intensity to the maximal level observed in wild-type bacteria. Sequence analysis of luxO revealed one open reading frame that encoded a protein, LuxO, similar in amino acid sequence to the response regulator domain of the family of two-component, signal transduction proteins. The constitutive phenotype of LuxO- mutants indicates that LuxO acts negatively to control expression of luminescence, and relief of repression by LuxO in the wild type could result from interactions with other components in the Lux signalling system.

  6. Identification of cis-regulatory sequences that activate transcription in the suspensor of plant embryos.

    Science.gov (United States)

    Kawashima, Tomokazu; Wang, Xingjun; Henry, Kelli F; Bi, Yuping; Weterings, Koen; Goldberg, Robert B

    2009-03-03

    Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the scarlet runner bean (Phaseolus coccineus) G564 gene to understand how genes are activated specifically within the suspensor during early embryo development. Previously, we showed that the G564 upstream region has a block of tandem repeats, which contain a conserved 10-bp motif (GAAAAG(C)/(T)GAA), and that deletion of these repeats results in a loss of suspensor transcription. Here, we use gain-of-function (GOF) experiments with transgenic globular-stage tobacco embryos to show that only 1 of the 5 tandem repeats is required to drive suspensor-specific transcription. Fine-scale deletion and scanning mutagenesis experiments with 1 tandem repeat uncovered a 54-bp region that contains all of the sequences required to activate transcription in the suspensor, including the 10-bp motif (GAAAAGCGAA) and a similar 10-bp-like motif (GAAAAACGAA). Site-directed mutagenesis and GOF experiments indicated that both the 10-bp and 10-bp-like motifs are necessary, but not sufficient to activate transcription in the suspensor, and that a sequence (TTGGT) between the 10-bp and the 10-bp-like motifs is also necessary for suspensor transcription. Together, these data identify sequences that are required to activate transcription in the suspensor of a plant embryo after fertilization.

  7. Identification of multiple distinct Snf2 subfamilies with conserved structural motifs.

    Science.gov (United States)

    Flaus, Andrew; Martin, David M A; Barton, Geoffrey J; Owen-Hughes, Tom

    2006-01-01

    The Snf2 family of helicase-related proteins includes the catalytic subunits of ATP-dependent chromatin remodelling complexes found in all eukaryotes. These act to regulate the structure and dynamic properties of chromatin and so influence a broad range of nuclear processes. We have exploited progress in genome sequencing to assemble a comprehensive catalogue of over 1300 Snf2 family members. Multiple sequence alignment of the helicase-related regions enables 24 distinct subfamilies to be identified, a considerable expansion over earlier surveys. Where information is known, there is a good correlation between biological or biochemical function and these assignments, suggesting Snf2 family motor domains are tuned for specific tasks. Scanning of complete genomes reveals all eukaryotes contain members of multiple subfamilies, whereas they are less common and not ubiquitous in eubacteria or archaea. The large sample of Snf2 proteins enables additional distinguishing conserved sequence blocks within the helicase-like motor to be identified. The establishment of a phylogeny for Snf2 proteins provides an opportunity to make informed assignments of function, and the identification of conserved motifs provides a framework for understanding the mechanisms by which these proteins function.

  8. Avian reovirus L2 genome segment sequences and predicted structure/function of the encoded RNA-dependent RNA polymerase protein

    Directory of Open Access Journals (Sweden)

    Xu Wanhong

    2008-12-01

    Full Text Available Abstract Background The orthoreoviruses are infectious agents that possess a genome comprised of 10 double-stranded RNA segments encased in two concentric protein capsids. Like virtually all RNA viruses, an RNA-dependent RNA polymerase (RdRp enzyme is required for viral propagation. RdRp sequences have been determined for the prototype mammalian orthoreoviruses and for several other closely-related reoviruses, including aquareoviruses, but have not yet been reported for any avian orthoreoviruses. Results We determined the L2 genome segment nucleotide sequences, which encode the RdRp proteins, of two different avian reoviruses, strains ARV138 and ARV176 in order to define conserved and variable regions within reovirus RdRp proteins and to better delineate structure/function of this important enzyme. The ARV138 L2 genome segment was 3829 base pairs long, whereas the ARV176 L2 segment was 3830 nucleotides long. Both segments were predicted to encode λB RdRp proteins 1259 amino acids in length. Alignments of these newly-determined ARV genome segments, and their corresponding proteins, were performed with all currently available homologous mammalian reovirus (MRV and aquareovirus (AqRV genome segment and protein sequences. There was ~55% amino acid identity between ARV λB and MRV λ3 proteins, making the RdRp protein the most highly conserved of currently known orthoreovirus proteins, and there was ~28% identity between ARV λB and homologous MRV and AqRV RdRp proteins. Predictive structure/function mapping of identical and conserved residues within the known MRV λ3 atomic structure indicated most identical amino acids and conservative substitutions were located near and within predicted catalytic domains and lining RdRp channels, whereas non-identical amino acids were generally located on the molecule's surfaces. Conclusion The ARV λB and MRV λ3 proteins showed the highest ARV:MRV identity values (~55% amongst all currently known ARV and MRV

  9. Functional comparison of the nematode Hox gene lin-39 in C. elegans and P. pacificus reveals evolutionary conservation of protein function despite divergence of primary sequences.

    Science.gov (United States)

    Grandien, K; Sommer, R J

    2001-08-15

    Hox transcription factors have been implicated in playing a central role in the evolution of animal morphology. Many studies indicate the evolutionary importance of regulatory changes in Hox genes, but little is known about the role of functional changes in Hox proteins. In the nematodes Pristionchus pacificus and Caenorhabditis elegans, developmental processes can be compared at the cellular, genetic, and molecular levels and differences in gene function can be identified. The Hox gene lin-39 is involved in the regulation of nematode vulva development. Comparison of known lin-39 mutations in P. pacificus and C. elegans revealed both conservation and changes of gene function. Here, we study evolutionary changes of lin-39 function using hybrid transgenes and site-directed mutagenesis in an in vivo assay using C. elegans lin-39 mutants. Our data show that despite the functional differences of LIN-39 between the two species, Ppa-LIN-39, when driven by Cel-lin-39 regulatory elements, can functionally replace Cel-lin-39. Furthermore, we show that the MAPK docking and phosphorylation motifs unique for Cel-LIN-39 are dispensable for Cel-lin-39 function. Therefore, the evolution of lin-39 function is driven by changes in regulatory elements rather than changes in the protein itself.

  10. Interaction of MYC with host cell factor-1 is mediated by the evolutionarily conserved Myc box IV motif.

    Science.gov (United States)

    Thomas, L R; Foshage, A M; Weissmiller, A M; Popay, T M; Grieb, B C; Qualls, S J; Ng, V; Carboneau, B; Lorey, S; Eischen, C M; Tansey, W P

    2016-07-07

    The MYC family of oncogenes encodes a set of three related transcription factors that are overexpressed in many human tumors and contribute to the cancer-related deaths of more than 70,000 Americans every year. MYC proteins drive tumorigenesis by interacting with co-factors that enable them to regulate the expression of thousands of genes linked to cell growth, proliferation, metabolism and genome stability. One effective way to identify critical co-factors required for MYC function has been to focus on sequence motifs within MYC that are conserved throughout evolution, on the assumption that their conservation is driven by protein-protein interactions that are vital for MYC activity. In addition to their DNA-binding domains, MYC proteins carry five regions of high sequence conservation known as Myc boxes (Mb). To date, four of the Mb motifs (MbI, MbII, MbIIIa and MbIIIb) have had a molecular function assigned to them, but the precise role of the remaining Mb, MbIV, and the reason for its preservation in vertebrate Myc proteins, is unknown. Here, we show that MbIV is required for the association of MYC with the abundant transcriptional coregulator host cell factor-1 (HCF-1). We show that the invariant core of MbIV resembles the tetrapeptide HCF-binding motif (HBM) found in many HCF-interaction partners, and demonstrate that MYC interacts with HCF-1 in a manner indistinguishable from the prototypical HBM-containing protein VP16. Finally, we show that rationalized point mutations in MYC that disrupt interaction with HCF-1 attenuate the ability of MYC to drive tumorigenesis in mice. Together, these data expose a molecular function for MbIV and indicate that HCF-1 is an important co-factor for MYC.

  11. Disparate requirements for the Walker A and B ATPase motifs of human RAD51D in homologous recombination.

    Science.gov (United States)

    Wiese, Claudia; Hinz, John M; Tebbs, Robert S; Nham, Peter B; Urbin, Salustra S; Collins, David W; Thompson, Larry H; Schild, David

    2006-01-01

    In vertebrates, homologous recombinational repair (HRR) requires RAD51 and five RAD51 paralogs (XRCC2, XRCC3, RAD51B, RAD51C and RAD51D) that all contain conserved Walker A and B ATPase motifs. In human RAD51D we examined the requirement for these motifs in interactions with XRCC2 and RAD51C, and for survival of cells in response to DNA interstrand crosslinks (ICLs). Ectopic expression of wild-type human RAD51D or mutants having a non-functional A or B motif was used to test for complementation of a rad51d knockout hamster CHO cell line. Although A-motif mutants complement very efficiently, B-motif mutants do not. Consistent with these results, experiments using the yeast two- and three-hybrid systems show that the interactions between RAD51D and its XRCC2 and RAD51C partners also require a functional RAD51D B motif, but not motif A. Similarly, hamster Xrcc2 is unable to bind to the non-complementing human RAD51D B-motif mutants in co-immunoprecipitation assays. We conclude that a functional Walker B motif, but not A motif, is necessary for RAD51D's interactions with other paralogs and for efficient HRR. We present a model in which ATPase sites are formed in a bipartite manner between RAD51D and other RAD51 paralogs.

  12. Disparate requirements for the Walker A and B ATPase motifs ofhuman RAD51D in homologous recombination

    Energy Technology Data Exchange (ETDEWEB)

    Wiese, Claudia; Hinz, John M.; Tebbs, Robert S.; Nham, Peter B.; Urbin, Salustra S.; Collins, David W.; Thompson, Larry H.; Schild, David

    2006-04-21

    In vertebrates, homologous recombinational repair (HRR) requires RAD51 and five RAD51 paralogs (XRCC2, XRCC3, RAD51B, RAD51C, and RAD51D) that all contain conserved Walker A and B ATPase motifs. In human RAD51D we examined the requirement for these motifs in interactions with XRCC2 and RAD51C, and for survival of cells in response to DNA interstrand crosslinks. Ectopic expression of wild type human RAD51D or mutants having a non-functional A or B motif was used to test for complementation of a rad51d knockout hamster CHO cell line. Although A-motif mutants complement very efficiently, B-motif mutants do not. Consistent with these results, experiments using the yeast two- and three-hybrid systems show that the interactions between RAD51D and its XRCC2 and RAD51C partners also require a functional RAD51D B motif, but not motif A. Similarly, hamster Xrcc2 is unable to bind to the non-complementing human RAD51D B-motif mutants in co-immunoprecipitation assays. We conclude that a functional Walker B motif, but not A motif, is necessary for RAD51D's interactions with other paralogs and for efficient HRR. We present a model in which ATPase sites are formed in a bipartite manner between RAD51D and other RAD51 paralogs.

  13. Evolutionary relationships in the ilarviruses: nucleotide sequence of prunus necrotic ringspot virus RNA 3.

    Science.gov (United States)

    Sánchez-Navarro, J A; Pallás, V

    1997-01-01

    The complete nucleotide sequence of an isolate of prunus necrotic ringspot virus (PNRSV) RNA 3 has been determined. Elucidation of the amino acid sequence of the proteins encoded by the two large open reading frames (ORFs) allowed us to carry out comparative and phylogenetic studies on the movement (MP) and coat (CP) proteins in the ilarvirus group. Amino acid sequence comparison of the MP revealed a highly conserved basic sequence motif with an amphipathic alpha-helical structure preceding the conserved motif of the '30K superfamily' proposed by Mushegian and Koonin [26] for MP's. Within this '30K' motif a strictly conserved transmembrane domain is present in all ilarviruses sequenced so far. At the amino-terminal end, prune dwarf virus (PDV) has an extension not present in other ilarviruses but which is observed in all bromo- and cucumoviruses, suggesting a common ancestor or a recombinational event in the Bromoviridae family. Examination of the N-terminus of the CP's of all ilarviruses revealed a highly basic region, part of which resembles the Arg-rich motif that has been characterized in the RNA-binding protein family. This motif has also been found in the other members of the Bromoviridae family, suggesting its involvement in a structural function. Furthermore this region is required for infectivity in ilarviruses. The similarities found in this Arg-rich motif are discussed in terms of this process known as genome activation. Finally, phylogenetic analysis of both the MP and CP proteins revealed a higher relationship of A1MV to PNRSV, apple mosaic virus (ApMV) and PDV than any other member of the ilarvirus group. In that sense, A1MV should be considered as a true ilarvirus instead of forming a distinct group of viruses.

  14. The use of orthologous sequences to predict the impact of amino acid substitutions on protein function.

    Directory of Open Access Journals (Sweden)

    Nicholas J Marini

    2010-05-01

    Full Text Available Computational predictions of the functional impact of genetic variation play a critical role in human genetics research. For nonsynonymous coding variants, most prediction algorithms make use of patterns of amino acid substitutions observed among homologous proteins at a given site. In particular, substitutions observed in orthologous proteins from other species are often assumed to be tolerated in the human protein as well. We examined this assumption by evaluating a panel of nonsynonymous mutants of a prototypical human enzyme, methylenetetrahydrofolate reductase (MTHFR, in a yeast cell-based functional assay. As expected, substitutions in human MTHFR at sites that are well-conserved across distant orthologs result in an impaired enzyme, while substitutions present in recently diverged sequences (including a 9-site mutant that "resurrects" the human-macaque ancestor result in a functional enzyme. We also interrogated 30 sites with varying degrees of conservation by creating substitutions in the human enzyme that are accepted in at least one ortholog of MTHFR. Quite surprisingly, most of these substitutions were deleterious to the human enzyme. The results suggest that selective constraints vary between phylogenetic lineages such that inclusion of distant orthologs to infer selective pressures on the human enzyme may be misleading. We propose that homologous proteins are best used to reconstruct ancestral sequences and infer amino acid conservation among only direct lineal ancestors of a particular protein. We show that such an "ancestral site preservation" measure outperforms other prediction methods, not only in our selected set for MTHFR, but also in an exhaustive set of E. coli LacI mutants.

  15. Metatranscriptome Sequencing Reveals Insights into the Gene Expression and Functional Potential of Rumen Wall Bacteria

    Directory of Open Access Journals (Sweden)

    Evelyne Mann

    2018-01-01

    Full Text Available Microbiota of the rumen wall constitute an important niche of rumen microbial ecology and their composition has been elucidated in different ruminants during the last years. However, the knowledge about the function of rumen wall microbes is still limited. Rumen wall biopsies were taken from three fistulated dairy cows under a standard forage-based diet and after 4 weeks of high concentrate feeding inducing a subacute rumen acidosis (SARA. Extracted RNA was used for metatranscriptome sequencing using Illumina HiSeq sequencing technology. The gene expression of the rumen wall microbial community was analyzed by mapping 35 million sequences against the Kyoto Encyclopedia for Genes and Genomes (KEGG database and determining differentially expressed genes. A total of 1,607 functional features were assigned with high expression of genes involved in central metabolism, galactose, starch and sucrose metabolism. The glycogen phosphorylase (EC:2.4.1.1 which degrades (1->4-alpha-D-glucans was among the highest expressed genes being transcribed by 115 bacterial genera. Energy metabolism genes were also highly expressed, including the pyruvate orthophosphate dikinase (EC:2.7.9.1 involved in pyruvate metabolism, which was covered by 177 genera. Nitrogen metabolism genes, in particular glutamate dehydrogenase (EC:1.4.1.4, glutamine synthetase (EC:6.3.1.2 and glutamate synthase (EC:1.4.1.13, EC:1.4.1.14 were also found to be highly expressed and prove rumen wall microbiota to be actively involved in providing host-relevant metabolites for exchange across the rumen wall. In addition, we found all four urease subunits (EC:3.5.1.5 transcribed by members of the genera Flavobacterium, Corynebacterium, Helicobacter, Clostridium, and Bacillus, and the dissimilatory sulfate reductase (EC 1.8.99.5 dsrABC, which is responsible for the reduction of sulfite to sulfide. We also provide in situ evidence for cellulose and cellobiose degradation, a key step in fiber-rich feed

  16. Novel peptide-based platform for the dual presentation of biologically active peptide motifs on biomaterials.

    Science.gov (United States)

    Mas-Moruno, Carlos; Fraioli, Roberta; Albericio, Fernando; Manero, José María; Gil, F Javier

    2014-05-14

    Biofunctionalization of metallic materials with cell adhesive molecules derived from the extracellular matrix is a feasible approach to improve cell-material interactions and enhance the biointegration of implant materials (e.g., osseointegration of bone implants). However, classical biomimetic strategies may prove insufficient to elicit complex and multiple biological signals required in the processes of tissue regeneration. Thus, newer strategies are focusing on installing multifunctionality on biomaterials. In this work, we introduce a novel peptide-based divalent platform with the capacity to simultaneously present distinct bioactive peptide motifs in a chemically controlled fashion. As a proof of concept, the integrin-binding sequences RGD and PHSRN were selected and introduced in the platform. The biofunctionalization of titanium with this platform showed a positive trend towards increased numbers of cell attachment, and statistically higher values of spreading and proliferation of osteoblast-like cells compared to control noncoated samples. Moreover, it displayed statistically comparable or improved cell responses compared to samples coated with the single peptides or with an equimolar mixture of the two motifs. Osteoblast-like cells produced higher levels of alkaline phosphatase on surfaces functionalized with the platform than on control titanium; however, these values were not statistically significant. This study demonstrates that these peptidic structures are versatile tools to convey multiple biofunctionality to biomaterials in a chemically defined manner.

  17. Functional and Structural Overview of G-Protein-Coupled Receptors Comprehensively Obtained from Genome Sequences

    Directory of Open Access Journals (Sweden)

    Makiko Suwa

    2011-04-01

    Full Text Available An understanding of the functional mechanisms of G-protein-coupled receptors (GPCRs is very important for GPCR-related drug design. We have developed an integrated GPCR database (SEVENS http://sevens.cbrc.jp/ that includes 64,090 reliable GPCR genes comprehensively identified from 56 eukaryote genome sequences, and overviewed the sequences and structure spaces of the GPCRs. In vertebrates, the number of receptors for biological amines, peptides, etc. is conserved in most species, whereas the number of chemosensory receptors for odorant, pheromone, etc. significantly differs among species. The latter receptors tend to be single exon type or a few exon type and show a high ratio in the numbers of GPCRs, whereas some families, such as Class B and Class C receptors, have long lengths due to the presence of many exons. Statistical analyses of amino acid residues reveal that most of the conserved residues in Class A GPCRs are found in the cytoplasmic half regions of transmembrane (TM helices, while residues characteristic to each subfamily found on the extracellular half regions. The 69 of Protein Data Bank (PDB entries of complete or fragmentary structures could be mapped on the TM/loop regions of Class A GPCRs covering 14 subfamilies.

  18. Expressed sequence tag analysis of functional genes associated with adventitious rooting in Liriodendron hybrids.

    Science.gov (United States)

    Zhong, Y D; Sun, X Y; Liu, E Y; Li, Y Q; Gao, Z; Yu, F X

    2016-06-24

    Liriodendron hybrids (Liriodendron chinense x L. tulipifera) are important landscaping and afforestation hardwood trees. To date, little genomic research on adventitious rooting has been reported in these hybrids, as well as in the genus Liriodendron. In the present study, we used adventitious roots to construct the first cDNA library for Liriodendron hybrids. A total of 5176 expressed sequence tags (ESTs) were generated and clustered into 2921 unigenes. Among these unigenes, 2547 had significant homology to the non-redundant protein database representing a wide variety of putative functions. Homologs of these genes regulated many aspects of adventitious rooting, including those for auxin signal transduction and root hair development. Results of quantitative real-time polymerase chain reaction showed that AUX1, IRE, and FB1 were highly expressed in adventitious roots and the expression of AUX1, ARF1, NAC1, RHD1, and IRE increased during the development of adventitious roots. Additionally, 181 simple sequence repeats were identified from 166 ESTs and more than 91.16% of these were dinucleotide and trinucleotide repeats. To the best of our knowledge, the present study reports the identification of the genes associated with adventitious rooting in the genus Liriodendron for the first time and provides a valuable resource for future genomic studies. Expression analysis of selected genes could allow us to identify regulatory genes that may be essential for adventitious rooting.

  19. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  20. Extensive Mutagenesis of the Conserved Box E Motif in Duck Hepatitis B Virus P Protein Reveals Multiple Functions in Replication and a Common Structure with the Primer Grip in HIV-1 Reverse Transcriptase

    OpenAIRE

    Wang, Yong-Xiang; Luo, Cheng; Zhao, Dan; Beck, Jürgen; Nassal, Michael

    2012-01-01

    Hepadnaviruses, including the pathogenic hepatitis B virus (HBV), replicate their small DNA genomes through protein-primed reverse transcription, mediated by the terminal protein (TP) domain in their P proteins and an RNA stem-loop, ϵ, on the pregenomic RNA (pgRNA). No direct structural data are available for P proteins, but their reverse transcriptase (RT) domains contain motifs that are conserved in all RTs (box A to box G), implying a similar architecture; however, experimental support for...

  1. MotifNet: a web-server for network motif analysis.

    Science.gov (United States)

    Smoly, Ilan Y; Lerman, Eugene; Ziv-Ukelson, Michal; Yeger-Lotem, Esti

    2017-06-15

    Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  2. Functional MRI of the pharynx in obstructive sleep apnea (OSA) with rapid 2-D flash sequences

    International Nuclear Information System (INIS)

    Jaeger, L.; Guenther, E.; Gauger, J.; Nitz, W.; Kastenbauer, E.; Reiser, M.

    1996-01-01

    Functional imaging of the pharynx used to be the domain of cineradiography, CT and ultrafast CT. The development of modern MRI techniques led to new access to functional disorders of the pharynx. The aim of this study was to implement a new MRI technique to examine oropharyngeal obstructive mechanisms in patients with obstructive sleep apnea (OSA). Sixteen patients suffering from OSA and 6 healthy volunteers were examined on a 1.5 T whole-body imager ('Vision', Siemens, Erlangen Medical Engineering, Germany) using a circular polarized head coil. Imaging was performed with 2D flash sequences in midsagittal and axial planes. Patients and volunteers were asked to breathe normally through the nose and to simulate snoring and the Mueller maneuver during magnetic resonance imaging (MRI). Prior to MRI, all patients underwent an ear, nose and throat (ENT) examination, functional fiberoptic nasopharyngoscopy and polysomnography. A temporal resolution of 6 images/s and an in-plane resolution of 2.67x1.8 mm were achieved. The mobility of the tongue, soft palate and pharyngeal surface could be clearly delineated. The MRI findings correlated well with the clinical examinations. We propose ultrafast MRI as a reliable and non-invasive method of evaluating pharyngeal obstruction and their levels. (orig.) [de

  3. How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs

    KAUST Repository

    Alam, Tanvir

    2014-05-29

    LD motifs (leucine-aspartic acidmotifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs. © 2014 Biochemical Society.

  4. How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs

    KAUST Repository

    Alam, Tanvir; Alazmi, Meshari; Gao, Xin; Arold, Stefan T.

    2014-01-01

    LD motifs (leucine-aspartic acidmotifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs. © 2014 Biochemical Society.

  5. A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction.

    Science.gov (United States)

    Guo, Yuchun; Tian, Kevin; Zeng, Haoyang; Guo, Xiaoyun; Gifford, David Kenneth

    2018-04-13

    The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k -mer set memory (KSM), which consists of a set of aligned k -mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations. © 2018 Guo et al.; Published by Cold Spring Harbor Laboratory Press.

  6. Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

    Science.gov (United States)

    Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

    2001-08-15

    This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

  7. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  8. Aspects of the generation of finite-difference Green's function sequences for arbitrary 3-D cubic lattice points

    NARCIS (Netherlands)

    de Hon, B.P.; Arnold, J.M.

    2015-01-01

    The robust and speedy evaluation of lattice Green's functions LGFs) is crucial to the effectiveness of finite-difference Green's function diakoptics schemes. We have recently determined a generic recurrence scheme for the construction of scalar LGF sequences at arbitrary points on a 3-D cubic

  9. BrEPS: a flexible and automatic protocol to compute enzyme-specific sequence profiles for functional annotation

    Directory of Open Access Journals (Sweden)

    Schomburg D

    2010-12-01

    Full Text Available Abstract Background Models for the simulation of metabolic networks require the accurate prediction of enzyme function. Based on a genomic sequence, enzymatic functions of gene products are today mainly predicted by sequence database searching and operon analysis. Other methods can support these techniques: We have developed an automatic method "BrEPS" that creates highly specific sequence patterns for the functional annotation of enzymes. Results The enzymes in the UniprotKB are identified and their sequences compared against each other with BLAST. The enzymes are then clustered into a number of trees, where each tree node is associated with a set of EC-numbers. The enzyme sequences in the tree nodes are aligned with ClustalW. The conserved columns of the resulting multiple alignments are used to construct sequence patterns. In the last step, we verify the quality of the patterns by computing their specificity. Patterns with low specificity are omitted and recomputed further down in the tree. The final high-quality patterns can be used for functional annotation. We ran our protocol on a recent Swiss-Prot release and show statistics, as well as a comparison to PRIAM, a probabilistic method that is also specialized on the functional annotation of enzymes. We determine the amount of true positive annotations for five common microorganisms with data from BRENDA and AMENDA serving as standard of truth. BrEPS is almost on par with PRIAM, a fact which we discuss in the context of five manually investigated cases. Conclusions Our protocol computes highly specific sequence patterns that can be used to support the functional annotation of enzymes. The main advantages of our method are that it is automatic and unsupervised, and quite fast once the patterns are evaluated. The results show that BrEPS can be a valuable addition to the reconstruction of metabolic networks.

  10. The C-terminal sequence of several human serine proteases encodes host defense functions.

    Science.gov (United States)

    Kasetty, Gopinath; Papareddy, Praveen; Kalle, Martina; Rydengård, Victoria; Walse, Björn; Svensson, Bo; Mörgelin, Matthias; Malmsten, Martin; Schmidtchen, Artur

    2011-01-01

    Serine proteases of the S1 family have maintained a common structure over an evolutionary span of more than one billion years, and evolved a variety of substrate specificities and diverse biological roles, involving digestion and degradation, blood clotting, fibrinolysis and epithelial homeostasis. We here show that a wide range of C-terminal peptide sequences of serine proteases, particularly from the coagulation and kallikrein systems, share characteristics common with classical antimicrobial peptides of innate immunity. Under physiological conditions, these peptides exert antimicrobial effects as well as immunomodulatory functions by inhibiting macrophage responses to bacterial lipopolysaccharide. In mice, selected peptides are protective against lipopolysaccharide-induced shock. Moreover, these S1-derived host defense peptides exhibit helical structures upon binding to lipopolysaccharide and also permeabilize liposomes. The results uncover new and fundamental aspects on host defense