WorldWideScience

Sample records for adaptor-associated clathrin-box motifs

  1. Motif Statistics

    Nicodème, Pierre; Salvy, Bruno; Flajolet, Philippe

    1999-01-01

    We present a complete analysis of the statistics of number of occurrences of a regular expression pattern in a random text. This covers «motifs» widely used in computational biology. Our approach is based on: (i) a constructive approach to classical results in theoretical computer science (automata and formal language theory), in particular, the rationality of generating functions of regular languages; (ii) analytic combinatorics that is used for deriving asymptotic properties from generating...

  2. Mining Conditional Phosphorylation Motifs.

    Liu, Xiaoqing; Wu, Jun; Gong, Haipeng; Deng, Shengchun; He, Zengyou

    2014-01-01

    Phosphorylation motifs represent position-specific amino acid patterns around the phosphorylation sites in the set of phosphopeptides. Several algorithms have been proposed to uncover phosphorylation motifs, whereas the problem of efficiently discovering a set of significant motifs with sufficiently high coverage and non-redundancy still remains unsolved. Here we present a novel notion called conditional phosphorylation motifs. Through this new concept, the motifs whose over-expressiveness mainly benefits from its constituting parts can be filtered out effectively. To discover conditional phosphorylation motifs, we propose an algorithm called C-Motif for a non-redundant identification of significant phosphorylation motifs. C-Motif is implemented under the Apriori framework, and it tests the statistical significance together with the frequency of candidate motifs in a single stage. Experiments demonstrate that C-Motif outperforms some current algorithms such as MMFPh and Motif-All in terms of coverage and non-redundancy of the results and efficiency of the execution. The source code of C-Motif is available at: https://sourceforge. net/projects/cmotif/. PMID:26356863

  3. The Motif Tracking Algorithm

    2008-01-01

    The search for patterns or motifs in data represents a problem area of key interest to finance and economic researchers. In this paper, we introduce the motif tracking algorithm (MTA), a novel immune inspired (IS) pattern identification tool that is able to identify unknown motifs of a non specified length which repeat within time series data. The power of the algorithm comes from the fact that it uses a small number of parameters with minimal assumptions regarding the data being examined or the underlying motifs. Our interest lies in applying the algorithm to financial time series data to identify unknown patterns that exist. The algorithm is tested using three separate data sets. Particular suitability to financial data is shown by applying it to oil price data. In all cases, the algorithm identifies the presence of a motif population in a fast and efficient manner due to the utilization of an intuitive symbolic representation.The resulting population of motifs is shown to have considerable potential value for other applications such as forecasting and algorithm seeding.

  4. The Motif Tracking Algorithm

    Wilson, William; Aickelin, Uwe; 10.1007/s11633.008.0032.0

    2010-01-01

    The search for patterns or motifs in data represents a problem area of key interest to finance and economic researchers. In this paper we introduce the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs of a non specified length which repeat within time series data. The power of the algorithm comes from the fact that it uses a small number of parameters with minimal assumptions regarding the data being examined or the underlying motifs. Our interest lies in applying the algorithm to financial time series data to identify unknown patterns that exist. The algorithm is tested using three separate data sets. Particular suitability to financial data is shown by applying it to oil price data. In all cases the algorithm identifies the presence of a motif population in a fast and efficient manner due to the utilisation of an intuitive symbolic representation. The resulting population of motifs is shown to have considerable potential value for other ap...

  5. Visibility graph motifs

    Iacovacci, Jacopo

    2015-01-01

    Visibility algorithms transform time series into graphs and encode dynamical information in their topology, paving the way for graph-theoretical time series analysis as well as building a bridge between nonlinear dynamics and network science. In this work we introduce and study the concept of visibility graph motifs, smaller substructures that appear with characteristic frequencies. We develop a theory to compute in an exact way the motif profiles associated to general classes of deterministic and stochastic dynamics. We find that this simple property is indeed a highly informative and computationally efficient feature capable to distinguish among different dynamics and robust against noise contamination. We finally confirm that it can be used in practice to perform unsupervised learning, by extracting motif profiles from experimental heart-rate series and being able, accordingly, to disentangle meditative from other relaxation states. Applications of this general theory include the automatic classification a...

  6. [Personal motif in art].

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy. PMID:26202617

  7. The MHC motif viewer

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole;

    2010-01-01

    In vertebrates, the onset of cellular immune reactions is controlled by presentation of peptides in complex with major histocompatibility complex (MHC) molecules to T cell receptors. In humans, MHCs are called human leukocyte antigens (HLAs). Different MHC molecules present different subsets of...... peptides, and knowledge of their binding specificities is important for understanding differences in the immune response between individuals. Algorithms predicting which peptides bind a given MHC molecule have recently been developed with high prediction accuracy. The utility of these algorithms is...... binding motif for each MHC molecule is predicted using state-of-the-art, pan-specific peptide-MHC binding-prediction methods, and is visualized as a sequence logo, in a format that allows for a comprehensive interpretation of binding motif anchor positions and amino acid preferences....

  8. MHC motif viewer

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole; Nielsen, Morten

    2008-01-01

    In vertebrates the major histocompatibility complex (MHC) presents peptides to the immune system. In humans MHCs are called human leukocyte antigens (HLAs), and some of the loci encoding them are the most polymorphic in the human genome. Different MHC molecules present different subsets of peptides, and knowledge of their binding specificities is important for understanding the differences in the immune response between individuals. Knowledge of motifs may be used to identify epitopes, unders...

  9. Mining protein sequences for motifs.

    Narasimhan, Giri; Bu, Changsong; Gao, Yuan; Wang, Xuning; Xu, Ning; Mathee, Kalai

    2002-01-01

    We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence. PMID:12487759

  10. MHC motif viewer

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole;

    2008-01-01

    In vertebrates, the major histocompatibility complex (MHC) presents peptides to the immune system. In humans, MHCs are called human leukocyte antigens (HLAs), and some of the loci encoding them are the most polymorphic in the human genome. Different MHC molecules present different subsets of....... Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif...

  11. Structural alphabet motif discovery and a structural motif database.

    Ku, Shih-Yen; Hu, Yuh-Jyh

    2012-01-01

    This study proposes a general framework for structural motif discovery. The framework is based on a modular design in which the system components can be modified or replaced independently to increase its applicability to various studies. It is a two-stage approach that first converts protein 3D structures into structural alphabet sequences, and then applies a sequence motif-finding tool to these sequences to detect conserved motifs. We named the structural motif database we built the SA-Motifbase, which provides the structural information conserved at different hierarchical levels in SCOP. For each motif, SA-Motifbase presents its 3D view; alphabet letter preference; alphabet letter frequency distribution; and the significance. SA-Motifbase is available at http://bioinfo.cis.nctu.edu.tw/samotifbase/. PMID:22099701

  12. The Annotation of RNA Motifs

    Eric Westhof

    2006-04-01

    Full Text Available The recent deluge of new RNA structures, including complete atomic-resolution views of both subunits of the ribosome, has on the one hand literally overwhelmed our individual abilities to comprehend the diversity of RNA structure, and on the other hand presented us with new opportunities for comprehensive use of RNA sequences for comparative genetic, evolutionary and phylogenetic studies. Two concepts are key to understanding RNA structure: hierarchical organization of global structure and isostericity of local interactions. Global structure changes extremely slowly, as it relies on conserved long-range tertiary interactions. Tertiary RNA–RNA and quaternary RNA–protein interactions are mediated by RNA motifs, defined as recurrent and ordered arrays of non-Watson–Crick base-pairs. A single RNA motif comprises a family of sequences, all of which can fold into the same three-dimensional structure and can mediate the same interaction(s. The chemistry and geometry of base pairing constrain the evolution of motifs in such a way that random mutations that occur within motifs are accepted or rejected insofar as they can mediate a similar ordered array of interactions. The steps involved in the analysis and annotation of RNA motifs in 3D structures are: (a decomposition of each motif into non-Watson–Crick base-pairs; (b geometric classification of each basepair; (c identification of isosteric substitutions for each basepair by comparison to isostericity matrices; (d alignment of homologous sequences using the isostericity matrices to identify corresponding positions in the crystal structure; (e acceptance or rejection of the null hypothesis that the motif is conserved.

  13. Sequential visibility-graph motifs

    Iacovacci, Jacopo; Lacasa, Lucas

    2016-04-01

    Visibility algorithms transform time series into graphs and encode dynamical information in their topology, paving the way for graph-theoretical time series analysis as well as building a bridge between nonlinear dynamics and network science. In this work we introduce and study the concept of sequential visibility-graph motifs, smaller substructures of n consecutive nodes that appear with characteristic frequencies. We develop a theory to compute in an exact way the motif profiles associated with general classes of deterministic and stochastic dynamics. We find that this simple property is indeed a highly informative and computationally efficient feature capable of distinguishing among different dynamics and robust against noise contamination. We finally confirm that it can be used in practice to perform unsupervised learning, by extracting motif profiles from experimental heart-rate series and being able, accordingly, to disentangle meditative from other relaxation states. Applications of this general theory include the automatic classification and description of physical, biological, and financial time series.

  14. Main: SEF1MOTIF [PLACE

    Full Text Available inding motif; sequence found in 5'-upstream region (-640; -765) of soybean beta-conglicinin (7S globulin) ge...ne; W=A/T; SOYBEAN; STORAGE PROTEIN; 7S; GLOBULIN; BETA-CONGLICININ; seed; soybean (Glycine max) ATATTTAWW ...

  15. MODIS: an audio motif discovery software

    Catanese, Laurence; Souviraà-Labastie, Nathan; Qu, Bingqing; Campion, Sébastien; Gravier, Guillaume; Vincent, Emmanuel; Bimbot, Frédéric

    2013-01-01

    International audience MODIS is a free speech and audio motif discovery software developed at IRISA Rennes. Motif discovery is the task of discovering and collecting occurrences of repeating patterns in the absence of prior knowledge, or training material. MODIS is based on a generic approach to mine repeating audio sequences, with tolerance to motif variability. The algorithm implementation allows to process large audio streams at a reasonable speed where motif discovery often requires hu...

  16. rMotifGen: random motif generator for DNA and protein sequences

    Hardin C Timothy

    2007-08-01

    Full Text Available Abstract Background Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM. Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Results Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. Conclusion rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.

  17. Identifying motifs in folktales using topic models

    Karsdorp, F.; Bosch, A.P.J. van den

    2013-01-01

    With the undertake of various folktale digitalization initiatives, the need for computational aids to explore these collections is increasing. In this paper we compare Labeled LDA (L-LDA) to a simple retrieval model on the task of identifying motifs in folktales. We show that both methods are well able to successfully discriminate between relevant and irrelevant motifs. L-LDA represents motifs as distributions over words. In a second experiment we compare the quality of these distributions to...

  18. Bridge and brick motifs in complex networks

    Huang, Chung-Yuan; Sun, Chuen-Tsai; Cheng, Chia-Ying; Hsieh, Ji-Lung

    2007-04-01

    Acknowledging the expanding role of complex networks in numerous scientific contexts, we examine significant functional and topological differences between bridge and brick motifs for predicting network behaviors and functions. After observing similarities between social networks and their genetic, ecological, and engineering counterparts, we identify a larger number of brick motifs in social networks and bridge motifs in the other three types. We conclude that bridge and brick motif content analysis can assist researchers in understanding the small-world and clustering properties of network structures when investigating network functions and behaviors.

  19. Assessment of composite motif discovery methods

    Johansen Jostein

    2008-02-01

    Full Text Available Abstract Background Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery – discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. Results We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Conclusion Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual

  20. Temporal motifs in time-dependent networks

    Kovanen, Lauri; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-01-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as networks of telecommunication, neural signal processing, biochemical reactions and human social interactions. We introduce the general framework of temporal motifs to study the mesoscale spatio-temporal structure of these networks. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences and to colored directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network.

  1. Temporal motifs in time-dependent networks

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological–temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network

  2. Sampling Motif-Constrained Ensembles of Networks

    Fischer, Rico; Leitão, Jorge C.; Peixoto, Tiago P.; Altmann, Eduardo G.

    2015-10-01

    The statistical significance of network properties is conditioned on null models which satisfy specified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this Letter we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, networks with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

  3. Sampling motif-constrained ensembles of networks

    Fischer, Rico; Peixoto, Tiago P; Altmann, Eduardo G

    2015-01-01

    The statistical significance of network properties is conditioned on null models which satisfy spec- ified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency, or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this paper we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, net- works with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

  4. MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis

    Klepper Kjetil

    2013-01-01

    Full Text Available Abstract Background Traditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming. Results Here we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results. Conclusions We have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely

  5. Detecting Motifs in System Call Sequences

    Wilson, William O; Aickelin, Uwe

    2010-01-01

    The search for patterns or motifs in data represents an area of key interest to many researchers. In this paper we present the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs which repeat within time series data. The power of the algorithm is derived from its use of a small number of parameters with minimal assumptions. The algorithm searches from a completely neutral perspective that is independent of the data being analysed, and the underlying motifs. In this paper the motif tracking algorithm is applied to the search for patterns within sequences of low level system calls between the Linux kernel and the operating system's user space. The MTA is able to compress data found in large system call data sets to a limited number of motifs which summarise that data. The motifs provide a resource from which a profile of executed processes can be built. The potential for these profiles and new implications for security research are highlighted. A...

  6. Automated motif discovery from glycan array data.

    Cholleti, Sharath R; Agravat, Sanjay; Morris, Tim; Saltz, Joel H; Song, Xuezheng; Cummings, Richard D; Smith, David F

    2012-10-01

    Assessing interactions of a glycan-binding protein (GBP) or lectin with glycans on a microarray generates large datasets, making it difficult to identify a glycan structural motif or determinant associated with the highest apparent binding strength of the GBP. We have developed a computational method, termed GlycanMotifMiner, that uses the relative binding of a GBP with glycans within a glycan microarray to automatically reveal the glycan structural motifs recognized by a GBP. We implemented the software with a web-based graphical interface for users to explore and visualize the discovered motifs. The utility of GlycanMotifMiner was determined using five plant lectins, SNA, HPA, PNA, Con A, and UEA-I. Data from the analyses of the lectins at different protein concentrations were processed to rank the glycans based on their relative binding strengths. The motifs, defined as glycan substructures that exist in a large number of the bound glycans and few non-bound glycans, were then discovered by our algorithm and displayed in a web-based graphical user interface ( http://glycanmotifminer.emory.edu ). The information is used in defining the glycan-binding specificity of GBPs. The results were compared to the known glycan specificities of these lectins generated by manual methods. A more complex analysis was also carried out using glycan microarray data obtained for a recombinant form of human galectin-8. Results for all of these lectins show that GlycanMotifMiner identified the major motifs known in the literature along with some unexpected novel binding motifs. PMID:22877213

  7. Fitness for synchronization of network motifs

    Vega, Y.M.; Vázquez-Prada, M.; Pacheco, A.F.; Vazquez-Prada Baillet, Miguel

    We study the synchronization of Kuramoto's oscillators in small parts of networks known as motifs. We first report on the system dynamics for the case of a scale-free network and show the existence of a non-trivial critical point. We compute the probability that network motifs synchronize, and fi...... that the fitness for synchronization correlates well with motifs interconnectedness and structural complexity. Possible implications for present debates about network evolution in biological and other systems are discussed. © 2004 Elsevier B.V. All rights reserved....

  8. Detecting seeded motifs in DNA sequences

    Pizzi, Cinzia; Bortoluzzi, Stefania; Bisognin, Andrea; Coppe, Alessandro; Danieli, Gian Antonio

    2005-01-01

    The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, compo...

  9. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

    Ngoc Tam L. Tran; Huang, Chun-Hsi

    2014-01-01

    Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs tha...

  10. Detecting seeded motifs in DNA sequences.

    Pizzi, Cinzia; Bortoluzzi, Stefania; Bisognin, Andrea; Coppe, Alessandro; Danieli, Gian Antonio

    2005-01-01

    The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, composed by regions with a different extent of variability. The method is based on a multi-step approach, which was implemented in a motif searching web tool (MOST). Overrepresented exact patterns are extracted from input sequences and clustered to produce motifs core regions, which are then extended and scored to generate seeded motifs. The combination of automated pattern discovery algorithms and different display tools for the evaluation and selection of results at several analysis steps can potentially lead to much more meaningful results than complete automation can produce. Experimental results on different yeast and human real datasets proved the methodology to be a promising solution for finding seeded motifs. MOST web tool is freely available at http://telethon.bio.unipd.it/bioinfo/MOST. PMID:16141193

  11. Detecting seeded motifs in DNA sequences

    Pizzi, Cinzia; Bortoluzzi, Stefania; Bisognin, Andrea; Coppe, Alessandro; Danieli, Gian Antonio

    2005-01-01

    The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, composed by regions with a different extent of variability. The method is based on a multi-step approach, which was implemented in a motif searching web tool (MOST). Overrepresented exact patterns are extracted from input sequences and clustered to produce motifs core regions, which are then extended and scored to generate seeded motifs. The combination of automated pattern discovery algorithms and different display tools for the evaluation and selection of results at several analysis steps can potentially lead to much more meaningful results than complete automation can produce. Experimental results on different yeast and human real datasets proved the methodology to be a promising solution for finding seeded motifs. MOST web tool is freely available at . PMID:16141193

  12. The MHC motif viewer: a visualization tool for MHC binding motifs

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole; Nielsen, Morten

    2010-01-01

    hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences. The...

  13. MOTIFATOR : detection and characterization of regulatory motifs using prokaryote transcriptome data

    Blom, Evert-Jan; Roerdink, Jos B.T.M.; Kuipers, Oscar P.; Hijum, Sacha A.F.T. van

    2009-01-01

    Unraveling regulatory mechanisms (e.g. identification of motifs in cis-regulatory regions) remains a major challenge in the analysis of transcriptome experiments. Existing applications identify putative motifs from gene lists obtained at rather arbitrary cutoff and require additional manual processi

  14. Sublinear Time Motif Discovery from Multiple Sequences

    Yunhui Fu

    2013-10-01

    Full Text Available In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet, Σ. A motif G = g1g2 ... gm is a string of m characters. In each background sequence is implanted a probabilistically-generated approximate copy of G. For a probabilistically-generated approximate copy b1b2 ... bm of G, every character, bi, is probabilistically generated, such that the probability for bi ≠ gi is at most α. We develop two new randomized algorithms and one new deterministic algorithm. They make advancements in the following aspects: (1 The algorithms are much faster than those before. Our algorithms can even run in sublinear time. (2 They can handle any motif pattern. (3 The restriction for the alphabet size is a lower bound of four. This gives them potential applications in practical problems, since gene sequences have an alphabet size of four. (4 All algorithms have rigorous proofs about their performances. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other software.

  15. Functional characterization of variations on regulatory motifs.

    Michal Lapidot

    2008-03-01

    Full Text Available Transcription factors (TFs regulate gene expression through specific interactions with short promoter elements. The same regulatory protein may recognize a variety of related sequences. Moreover, once they are detected it is hard to predict whether highly similar sequence motifs will be recognized by the same TF and regulate similar gene expression patterns, or serve as binding sites for distinct regulatory factors. We developed computational measures to assess the functional implications of variations on regulatory motifs and to compare the functions of related sites. We have developed computational means for estimating the functional outcome of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. We predict the effects of nucleotide variations within motifs on gene expression patterns. In cases where such predictions could be compared to suitable published experimental evidence, we found very good agreement. We further accumulated statistics from multiple substitutions across various binding sites in an attempt to deduce general properties that characterize nucleotide substitutions that are more likely to alter expression. We found that substitutions involving Adenine are more likely to retain the expression pattern and that substitutions involving Guanine are more likely to alter expression compared to the rest of the substitutions. Our results should facilitate the prediction of the expression outcomes of binding site variations. One typical important implication is expected to be the ability to predict the phenotypic effect of variation in regulatory motifs in promoters.

  16. Sequential motif profile of natural visibility graphs

    Iacovacci, Jacopo

    2016-01-01

    The concept of sequential visibility graph motifs -subgraphs appearing with characteristic frequencies in the visibility graphs associated to time series- has been advanced recently along with a theoretical framework to compute analytically the motif profiles associated to Horizontal Visibility Graphs (HVGs). Here we develop a theory to compute the profile of sequential visibility graph motifs in the context of Natural Visibility Graphs (VGs). This theory gives exact results for deterministic aperiodic processes with a smooth invariant density or stochastic processes that fulfil the Markov property and have a continuous marginal distribution. The framework also allows for a linear time numerical estimation in the case of empirical time series. A comparison between the HVG and the VG case (including evaluation of their robustness for short series polluted with measurement noise) is also presented.

  17. Highly scalable Ab initio genomic motif identification

    Marchand, Benoît

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  18. The Motif of Meeting in Digital Education

    Sheail, Philippa

    2015-01-01

    This article draws on theoretical work which considers the composition of meetings, in order to think about the form of the meeting in digital environments for higher education. To explore the motif of meeting, I undertake a "compositional interpretation" (Rose, 2012) of the default interface offered by "Collaborate", an…

  19. DNA motif elucidation using belief propagation

    Wong, Ka-Chun

    2013-06-29

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors\\' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).

  20. Parallel motif extraction from very long sequences

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  1. Identifying discriminative classification-based motifs in biological sequences

    Vens, Celine; Rosso, Marie-Noëlle; Danchin, Etienne

    2011-01-01

    Motivation: Identification of conserved motifs in biological sequences is crucial to unveil common shared functions. Many tools exist for motif identification, including some that allow degenerate positions with multiple possible nucleotides or amino acids. Most efficient methods available today search conserved motifs in a set of sequences, but do not check for their specificity regarding to a set of negative sequences. Results: We present a tool to identify degenerate motifs, based on a giv...

  2. Multilayer motif analysis of brain networks

    Battiston, Federico; Nicosia, Vincenzo; Chavez, Mario; Latora, Vito

    2016-01-01

    In the last decade network science has shed new light on the anatomical connectivity and on correlations in the activity of different areas of the human brain. The study of brain networks has made possible in fact to detect the central areas of a neural system, and to identify its building blocks by looking at overabundant small subgraphs, known as motifs. However, network analysis of the brain has so far mainly focused on structural and functional networks as separate entities. The recently ...

  3. Motif-specific sampling of phosphoproteomes

    Ruse, Cristian I.; McClatchy, Daniel B.; Lu, Bingwen; Cociorva, Daniel; Motoyama, Akira; Kyu Park, Sung; Yates, John R.

    2008-01-01

    Phosphoproteomics, the targeted study of a subfraction of the proteome which is modified by phosphorylation, has become an indispensable tool to study cell signaling dynamics. We described a methodology that linked phosphoproteome and proteome analysis based on Ba2+ binding properties of amino acids. This technology selected motif-specific phosphopeptides independent of the system under analysis. MudPIT (Multidimensional Identification Technology) identified 1037 precipitated phosphopeptides ...

  4. Social Network Analysis Based on Network Motifs

    Xu Hong-lin; Yan Han-bing; Gao Cui-fang; Zhu Ping

    2014-01-01

    Based on the community structure characteristics, theory, and methods of frequent subgraph mining, network motifs findings are firstly introduced into social network analysis; the tendentiousness evaluation function and the importance evaluation function are proposed for effectiveness assessment. Compared with the traditional way based on nodes centrality degree, the new approach can be used to analyze the properties of social network more fully and judge the roles of the nodes effectively. I...

  5. Multilayer motif analysis of brain networks

    Battiston, Federico; Chavez, Mario; Latora, Vito

    2016-01-01

    In the last decade network science has shed new light on the anatomical connectivity and on correlations in the activity of different areas of the human brain. The study of brain networks has made possible in fact to detect the central areas of a neural system, and to identify its building blocks by looking at overabundant small subgraphs, known as motifs. However, network analysis of the brain has so far mainly focused on structural and functional networks as separate entities. The recently developed mathematical framework of multi-layer networks allows to perform a multiplex analysis of the human brain where the structural and functional layers are considered at the same time. In this work we describe how to classify subgraphs in multiplex networks, and we extend motif analysis to networks with many layers. We then extract multi-layer motifs in brain networks of healthy subjects by considering networks with two layers, respectively obtained from diffusion and functional magnetic resonance imaging. Results i...

  6. Dynamic motifs in socio-economic networks

    Zhang, Xin; Shao, Shuai; Stanley, H. Eugene; Havlin, Shlomo

    2014-12-01

    Socio-economic networks are of central importance in economic life. We develop a method of identifying and studying motifs in socio-economic networks by focusing on “dynamic motifs,” i.e., evolutionary connection patterns that, because of “node acquaintances” in the network, occur much more frequently than random patterns. We examine two evolving bi-partite networks: i) the world-wide commercial ship chartering market and ii) the ship build-to-order market. We find similar dynamic motifs in both bipartite networks, even though they describe different economic activities. We also find that “influence” and “persistence” are strong factors in the interaction behavior of organizations. When two companies are doing business with the same customer, it is highly probable that another customer who currently only has business relationship with one of these two companies, will become customer of the second in the future. This is the effect of influence. Persistence means that companies with close business ties to customers tend to maintain their relationships over a long period of time.

  7. Dynamics of network motifs in genetic regulatory networks

    Li Ying; Liu Zeng-Rong; Zhang Jian-Bao

    2007-01-01

    Network motifs hold a very important status in genetic regulatory networks. This paper aims to analyse the dynamical property of the network motifs in genetic regulatory networks. The main result we obtained is that the dynamical property of a single motif is very simple with only an asymptotically stable equilibrium point, but the combination of several motifs can make more complicated dynamical properties emerge such as limit cycles. The above-mentioned result shows that network motif is a stable substructure in genetic regulatory networks while their combinations make the genetic regulatory network more complicated.

  8. ET-Motif: Solving the Exact (l, d)-Planted Motif Problem Using Error Tree Structure.

    Al-Okaily, Anas; Huang, Chun-Hsi

    2016-07-01

    Motif finding is an important and a challenging problem in many biological applications such as discovering promoters, enhancers, locus control regions, transcription factors, and more. The (l, d)-planted motif search, PMS, is one of several variations of the problem. In this problem, there are n given sequences over alphabets of size [Formula: see text], each of length m, and two given integers l and d. The problem is to find a motif m of length l, where in each sequence there is at least an l-mer at a Hamming distance of [Formula: see text] of m. In this article, we propose ET-Motif, an algorithm that can solve the PMS problem in [Formula: see text] time and [Formula: see text] space. The time bound can be further reduced by a factor of m with [Formula: see text] space. In case the suffix tree that is built for the input sequences is balanced, the problem can be solved in [Formula: see text] time and [Formula: see text] space. Similarly, the time bound can be reduced by a factor of m using [Formula: see text] space. Moreover, the variations of the problem, namely the edit distance PMS and edited PMS (Quorum), can be solved using ET-Motif with simple modifications but upper bands of space and time. For edit distance PMS, the time and space bounds will be increased by [Formula: see text], while for edited PMS the increase will be of [Formula: see text] in the time bound. PMID:27152692

  9. No tradeoff between versatility and robustness in gene circuit motifs

    Payne, Joshua L.

    2016-05-01

    Circuit motifs are small directed subgraphs that appear in real-world networks significantly more often than in randomized networks. In the Boolean model of gene circuits, most motifs are realized by multiple circuit genotypes. Each of a motif's constituent circuit genotypes may have one or more functions, which are embodied in the expression patterns the circuit forms in response to specific initial conditions. Recent enumeration of a space of nearly 17 million three-gene circuit genotypes revealed that all circuit motifs have more than one function, with the number of functions per motif ranging from 12 to nearly 30,000. This indicates that some motifs are more functionally versatile than others. However, the individual circuit genotypes that constitute each motif are less robust to mutation if they have many functions, hinting that functionally versatile motifs may be less robust to mutation than motifs with few functions. Here, I explore the relationship between versatility and robustness in circuit motifs, demonstrating that functionally versatile motifs are robust to mutation despite the inherent tradeoff between versatility and robustness at the level of an individual circuit genotype.

  10. CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design.

    Zhang, Shaoqiang; Chen, Yong

    2016-01-01

    A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html. PMID:27487245

  11. AISMOTIF-An Artificial Immune System for DNA Motif Discovery

    Seeja K R

    2011-03-01

    Full Text Available Discovery of transcription factor binding sites is a much explored and still exploring area of research in functional genomics. Many computational tools have been developed for finding motifs and each of them has their own advantages as well as disadvantages. Most of these algorithms need prior knowledge about the data to construct background models. However there is not a single technique that can be considered as best for finding regulatory motifs. This paper proposes an artificial immune system based algorithm for finding the transcription factor binding sites or motifs and two new weighted scores for motif evaluation. The algorithm is enumerative, but sufficient pruning of the pattern search space has been incorporated using immune system concepts. The performance of AISMOTIF has been evaluated by comparing it with eight state of art composite motif discovery algorithms and found that AISMOTIF predicts known motifs as well as new motifs from the benchmark dataset without any prior knowledge about the data.

  12. AISMOTIF-An Artificial Immune System for DNA Motif Discovery

    Seeja, K R

    2011-01-01

    Discovery of transcription factor binding sites is a much explored and still exploring area of research in functional genomics. Many computational tools have been developed for finding motifs and each of them has their own advantages as well as disadvantages. Most of these algorithms need prior knowledge about the data to construct background models. However there is not a single technique that can be considered as best for finding regulatory motifs. This paper proposes an artificial immune system based algorithm for finding the transcription factor binding sites or motifs and two new weighted scores for motif evaluation. The algorithm is enumerative, but sufficient pruning of the pattern search space has been incorporated using immune system concepts. The performance of AISMOTIF has been evaluated by comparing it with eight state of art composite motif discovery algorithms and found that AISMOTIF predicts known motifs as well as new motifs from the benchmark dataset without any prior knowledge about the data...

  13. Assessing the Exceptionality of Coloured Motifs in Networks

    Lacroix Vincent

    2009-01-01

    Full Text Available Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive -values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution better approximates the distribution of the motif count compared to Gaussian or Poisson distributions. The Pólya-Aeppli distribution, and more generally the compound Poisson distributions, are indeed well designed to model counts of clumping events. Altogether, these results enable to derive a -value for a coloured motif, without spending time on simulations.

  14. Acidic/IQ Motif Regulator of Calmodulin*

    Putkey, John A.; Waxham, M. Neal; Gaertner, Tara R.; Brewer, Kari J.; Goldsmith, Michael; Kubota, Yoshihisa; Kleerekoper, Quinn K.

    2007-01-01

    The small IQ motif proteins PEP-19 (62 amino acids) and RC3 (78 amino acids) greatly accelerate the rates of Ca2+ binding to sites III and IV in the C-domain of calmodulin (CaM). We show here that PEP-19 decreases the degree of cooperativity of Ca2+ binding to sites III and IV, and we present a model showing that this could increase Ca2+ binding rate constants. Comparative sequence analysis showed that residues 28 to 58 from PEP-19 are conserved in other proteins. This region includes the IQ ...

  15. RMOD: a tool for regulatory motif detection in signaling network.

    Kim, Jinki; Yi, Gwan-Su

    2013-01-01

    Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod. PMID:23874612

  16. A combinatorial optimization approach for diverse motif finding applications

    Singh Mona

    2006-08-01

    Full Text Available Abstract Background Discovering approximately repeated patterns, or motifs, in biological sequences is an important and widely-studied problem in computational molecular biology. Most frequently, motif finding applications arise when identifying shared regulatory signals within DNA sequences or shared functional and structural elements within protein sequences. Due to the diversity of contexts in which motif finding is applied, several variations of the problem are commonly studied. Results We introduce a versatile combinatorial optimization framework for motif finding that couples graph pruning techniques with a novel integer linear programming formulation. Our approach is flexible and robust enough to model several variants of the motif finding problem, including those incorporating substitution matrices and phylogenetic distances. Additionally, we give an approach for determining statistical significance of uncovered motifs. In testing on numerous DNA and protein datasets, we demonstrate that our approach typically identifies statistically significant motifs corresponding to either known motifs or other motifs of high conservation. Moreover, in most cases, our approach finds provably optimal solutions to the underlying optimization problem. Conclusion Our results demonstrate that a combined graph theoretic and mathematical programming approach can be the basis for effective and powerful techniques for diverse motif finding applications.

  17. Cross-disciplinary detection and analysis of network motifs.

    Tran, Ngoc Tam L; DeLuccia, Luke; McDonald, Aidan F; Huang, Chun-Hsi

    2015-01-01

    The detection of network motifs has recently become an important part of network analysis across all disciplines. In this work, we detected and analyzed network motifs from undirected and directed networks of several different disciplines, including biological network, social network, ecological network, as well as other networks such as airlines, power grid, and co-purchase of political books networks. Our analysis revealed that undirected networks are similar at the basic three and four nodes, while the analysis of directed networks revealed the distinction between networks of different disciplines. The study showed that larger motifs contained the three-node motif as a subgraph. Topological analysis revealed that similar networks have similar small motifs, but as the motif size increases, differences arise. Pearson correlation coefficient showed strong positive relationship between some undirected networks but inverse relationship between some directed networks. The study suggests that the three-node motif is a building block of larger motifs. It also suggests that undirected networks share similar low-level structures. Moreover, similar networks share similar small motifs, but larger motifs define the unique structure of individuals. Pearson correlation coefficient suggests that protein structure networks, dolphin social network, and co-authorships in network science belong to a superfamily. In addition, yeast protein-protein interaction network, primary school contact network, Zachary's karate club network, and co-purchase of political books network can be classified into a superfamily. PMID:25983553

  18. RMOD: a tool for regulatory motif detection in signaling network.

    Jinki Kim

    Full Text Available Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod.

  19. Protein functional-group 3D motif and its applications

    2000-01-01

    Representing and recognizing protein active sites sequence motif (1D motif) and structural motif (3D motif) is an important topic for predicting and designing protein function. Prevalent methods for extracting and searching 3D motif always consider residue as the minimal unit, which have limited sensitivity. Here we present a new spatial representation of protein active sites, called "functional-group 3D motif ", based on the fact that the functional groups inside a residue contribute mostly to its function. Relevant algorithm and computer program are developed, which could be widely used in the function prediction and the study of structural-function relationship of proteins. As a test, we defined a functional-group 3D motif of the catalytic triad and oxyanion hole with the structure of porcine trypsin (PDB code: 1mct) as the template. With our motif-searching program, we successfully found similar sub-structures in trypsins, subtilisins and a/b hydrolases, which show distinct folds but share similar catalytic mechanism. Moreover, this motif can be used to elucidate the structural basis of other proteins with variant catalytic triads by comparing it to those proteins. Finally, we scanned this motif against a non-redundant protein structure database to find its matches, and the results demonstrated the potential application of functional group 3D motif in function prediction. Above all, compared with the other 3D-motif representations on residues, the functional group 3D motif achieves better representation of protein active region, which is more sensitive for protein function prediction.

  20. Network Motifs: Simple Building Blocks of Complex Networks

    Milo, R.; Shen-Orr, S.; Itzkovitz, S.; Kashtan, N.; Chklovskii, D.; Alon, U.

    2002-10-01

    Complex networks are studied across many fields of science. To uncover their structural design principles, we defined ``network motifs,'' patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks. We found such motifs in networks from biochemistry, neurobiology, ecology, and engineering. The motifs shared by ecological food webs were distinct from the motifs shared by the genetic networks of Escherichia coli and Saccharomyces cerevisiae or from those found in the World Wide Web. Similar motifs were found in networks that perform information processing, even though they describe elements as different as biomolecules within a cell and synaptic connections between neurons in Caenorhabditis elegans. Motifs may thus define universal classes of networks. This approach may uncover the basic building blocks of most networks.

  1. Meisetz and the birth of the KRAB motif.

    Birtle, Zoë; Ponting, Chris P

    2006-12-01

    The largest family of transcription factors in mammals is of Cys(2)His(2) zinc finger-proteins, each with an NH(2)-terminal KRAB motif. Extensive expansions of this family have occurred in separate mammalian lineages, with approximately 400 such genes known in the human genome. Despite their widespread occurrence, the evolutionary provenance of the KRAB motif is unclear since previously it has not been found outside of the tetrapod vertebrates. Here, we show that homologues of the histone methyltransferase Meisetz are present within the sea urchin (Strongylocentrotus purpuratus) genome. Sea urchin and mammalian Meisetz sequences each contain an N-terminal KRAB motif, which thereby establishes an early origin of the KRAB motif prior to the divergence of echinoderm and chordate lineages. Finally, we present evidence that KRAB motifs derive from a novel family of KRI (KRAB Interior) motifs that were present in the last common ancestor of animals, plants and fungi. PMID:17032681

  2. Discriminative Motif Finding for Predicting Protein Subcellular Localization

    Lin, Tien-ho; Murphy, Robert F.; Bar-Joseph, Ziv

    2011-01-01

    Many methods have been described to predict the subcellular location of proteins from sequence information. However, most of these methods either rely on global sequence properties or use a set of known protein targeting motifs to predict protein localization. Here we develop and test a novel method that identifies potential targeting motifs using a discriminative approach based on hidden Markov models (discriminative HMMs). These models search for motifs that are present in a compartment but...

  3. Detecting DNA regulatory motifs by incorporating positional trendsin information content

    Kechris, Katherina J.; van Zwet, Erik; Bickel, Peter J.; Eisen,Michael B.

    2004-05-04

    On the basis of the observation that conserved positions in transcription factor binding sites are often clustered together, we propose a simple extension to the model-based motif discovery methods. We assign position-specific prior distributions to the frequency parameters of the model, penalizing deviations from a specified conservation profile. Examples with both simulated and real data show that this extension helps discover motifs as the data become noisier or when there is a competing false motif.

  4. Sequence motif discovery with computational genome-wide analysis

    Akashi, Hirofumi; Aoki, Fumio; Toyota, Minoru; Maruyama, Reo; Sasaki, Yasushi; Mita, Hiroaki; Tokura, Hajime; Imai, Kohzoh; Tatsumi, Haruyuki

    2006-01-01

    As a result of the human genome project and advancements in DNA sequencing technology, we can utilize a huge amount of nucleotide sequence data and can search DNA sequence motifs in whole human genome. However, searching motifs with the naked eye is an enormous task and searching throughout the whole genome is absolutely impossible. Therefore, we have developed a computational genome-wide analyzing system for detecting DNA sequence motifs with biological significance. We used a multi-parallel...

  5. A Comparative Study of Bases for Motif Inference

    Pisanti, Nadia; Crochemore, Maxime; Grossi, Roberto; Sagot, Marie-France

    2005-01-01

    International audience Motif inference is at the heart of several time-demanding computational tasks, such as in molecular biology, data mining and identification of structured motifs in sequences, and in data compression, to name a few. In this scenario, a motif is a pattern that appears repeated at least a certain number of times (the quorum), to be of interest. The pattern can be approximated in that some of its characters can be left unspecified (the don't cares). Motif inference is not ...

  6. Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks.

    Mark D McDonnell

    Full Text Available Complex networks are frequently characterized by metrics for which particular subgraphs are counted. One statistic from this category, which we refer to as motif-role fingerprints, differs from global subgraph counts in that the number of subgraphs in which each node participates is counted. As with global subgraph counts, it can be important to distinguish between motif-role fingerprints that are 'structural' (induced subgraphs and 'functional' (partial subgraphs. Here we show mathematically that a vector of all functional motif-role fingerprints can readily be obtained from an arbitrary directed adjacency matrix, and then converted to structural motif-role fingerprints by multiplying that vector by a specific invertible conversion matrix. This result demonstrates that a unique structural motif-role fingerprint exists for any given functional motif-role fingerprint. We demonstrate a similar result for the cases of functional and structural motif-fingerprints without node roles, and global subgraph counts that form the basis of standard motif analysis. We also explicitly highlight that motif-role fingerprints are elemental to several popular metrics for quantifying the subgraph structure of directed complex networks, including motif distributions, directed clustering coefficient, and transitivity. The relationships between each of these metrics and motif-role fingerprints also suggest new subtypes of directed clustering coefficients and transitivities. Our results have potential utility in analyzing directed synaptic networks constructed from neuronal connectome data, such as in terms of centrality. Other potential applications include anomaly detection in networks, identification of similar networks and identification of similar nodes within networks. Matlab code for calculating all stated metrics following calculation of functional motif-role fingerprints is provided as S1 Matlab File.

  7. A structure filter for the Eukaryotic Linear Motif Resource

    Gemünd Christine

    2009-10-01

    Full Text Available Abstract Background Many proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides. Linear motifs, short sequence modules functioning independently of protein tertiary structure, are most abundant in natively disordered polypeptides but are also found in accessible parts of globular domains, such as exposed loops. The prediction of novel occurrences of known linear motifs attempts the difficult task of distinguishing functional matches from stochastically occurring non-functional matches. Although functionality can only be confirmed experimentally, confidence in a putative motif is increased if a motif exhibits attributes associated with functional instances such as occurrence in the correct taxonomic range, cellular compartment, conservation in homologues and accessibility to interacting partners. Several tools now use these attributes to classify putative motifs based on confidence of functionality. Results Current methods assessing motif accessibility do not consider much of the information available, either predicting accessibility from primary sequence or regarding any motif occurring in a globular region as low confidence. We present a method considering accessibility and secondary structural context derived from experimentally solved protein structures to rectify this situation. Putatively functional motif occurrences are mapped onto a representative domain, given that a high quality reference SCOP domain structure is available for the protein itself or a close relative. Candidate motifs can then be scored for solvent-accessibility and secondary structure context. The scores are calibrated on a benchmark set of experimentally verified motif instances compared with a set of random matches. A combined score yields 3-fold enrichment for functional motifs assigned to high confidence classifications and 2.5-fold enrichment for random motifs assigned to low confidence classifications

  8. The MHC motif viewer: a visualization tool for MHC binding motifs

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole;

    2010-01-01

    In vertebrates, the onset of cellular immune reactions is controlled by presentation of peptides in complex with major histocompatibility complex (MHC) molecules to T cell receptors. In humans, MHCs are called human leukocyte antigens (HLAs). Different MHC molecules present different subsets of...... peptides, and knowledge of their binding specificities is important for understanding differences in the immune response between individuals. Algorithms predicting which peptides bind a given MHC molecule have recently been developed with high prediction accuracy. The utility of these algorithms is...... binding motif for each MHC molecule is predicted using state-of-the-art, pan-specific peptide-MHC binding-prediction methods, and is visualized as a sequence logo, in a format that allows for a comprehensive interpretation of binding motif anchor positions and amino acid preferences....

  9. Aztec, Incan and Mayan Motifs...Lead to Distinctive Designs.

    Shields, Joanne

    2001-01-01

    Describes an art project for seventh-grade students in which they choose motifs based on Incan, Aztec, and Mayan Indian materials to incorporate into two-dimensional designs. Explains that the activity objective is to create a unified, balanced and pleasing composition using a minimum of three motifs. (CMK)

  10. Discovering large network motifs from a complex biological network

    Graph structures representing relationships between entries have been studied in statistical analysis, and the results of these studies have been applied to biological networks, whose nodes and edges represent proteins and the relationships between them, respectively. Most of the studies have focused on only graph structures such as scale-free properties and cliques, but the relationships between nodes are also important features since most of the proteins perform their functions by connecting to other proteins. In order to determine such relationships, the problem of network motif discovery has been addressed; network motifs are frequently appearing graph structures in a given graph. However, the methods for network motif discovery are highly restrictive for the application to biological network because they can only be used to find small network motifs or they do not consider noise and uncertainty in observations. In this study, we introduce a new index to measure network motifs called AR index and develop a novel algorithm called ARIANA for finding large motifs even when the network has noise. Experiments using a synthetic network verify that our method can find better network motifs than an existing algorithm. By applying ARIANA to a real complex biological network, we find network motifs associated with regulations of start time of cell functions and generation of cell energies and discover that the cell cycle proteins can be categorized into two different groups.

  11. The phenomenon of astral motifs on late mediaeval tombstones

    Mijatović, V.; Ninković, S.; Vemić, D.

    2003-10-01

    The authors study astral motifs present on some mediaeval tombstones found in present-day Serbia and Montenegro and in the neighbouring countries (especially in Bosnia and Herzegovina). The authors discern some important astral motifs, explain them and present a short review concerning their frequency.

  12. Probing structural changes of self assembled i-motif DNA

    Lee, Iljoon

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  13. Role of GxxxG Motifs in Transmembrane Domain Interactions.

    Teese, Mark G; Langosch, Dieter

    2015-08-25

    Transmembrane (TM) helices of integral membrane proteins can facilitate strong and specific noncovalent protein-protein interactions. Mutagenesis and structural analyses have revealed numerous examples in which the interaction between TM helices of single-pass membrane proteins is dependent on a GxxxG or (small)xxx(small) motif. It is therefore tempting to use the presence of these simple motifs as an indicator of TM helix interactions. In this Current Topic review, we point out that these motifs are quite common, with more than 50% of single-pass TM domains containing a (small)xxx(small) motif. However, the actual interaction strength of motif-containing helices depends strongly on sequence context and membrane properties. In addition, recent studies have revealed several GxxxG-containing TM domains that interact via alternative interfaces involving hydrophobic, polar, aromatic, or even ionizable residues that do not form recognizable motifs. In multipass membrane proteins, GxxxG motifs can be important for protein folding, and not just oligomerization. Our current knowledge thus suggests that the presence of a GxxxG motif alone is a weak predictor of protein dimerization in the membrane. PMID:26244771

  14. MotifCombinator: a web-based tool to search for combinations of cis-regulatory motifs

    Tsunoda Tatsuhiko

    2007-03-01

    Full Text Available Abstract Background A combination of multiple types of transcription factors and cis-regulatory elements is often required for gene expression in eukaryotes, and the combinatorial regulation confers specific gene expression to tissues or environments. To reveal the combinatorial regulation, computational methods are developed that efficiently infer combinations of cis-regulatory motifs that are important for gene expression as measured by DNA microarrays. One promising type of computational method is to utilize regression analysis between expression levels and scores of motifs in input sequences. This type takes full advantage of information on expression levels because it does not require that the expression level of each gene be dichotomized according to whether or not it reaches a certain threshold level. However, there is no web-based tool that employs regression methods to systematically search for motif combinations and that practically handles combinations of more than two or three motifs. Results We here introduced MotifCombinator, an online tool with a user-friendly interface, to systematically search for combinations composed of any number of motifs based on regression methods. The tool utilizes well-known regression methods (the multivariate linear regression, the multivariate adaptive regression spline or MARS, and the multivariate logistic regression method for this purpose, and uses the genetic algorithm to search for combinations composed of any desired number of motifs. The visualization systems in this tool help users to intuitively grasp the process of the combination search, and the backup system allows users to easily stop and restart calculations that are expected to require large computational time. This tool also provides preparatory steps needed for systematic combination search – i.e., selecting single motifs to constitute combinations and cutting out redundant similar motifs based on clustering analysis. Conclusion

  15. An algorithm for motif-based network design

    Mäki-Marttunen, Tuomo

    2016-01-01

    A determinant property of the structure of a biological network is the distribution of local connectivity patterns, i.e., network motifs. In this work, a method for creating directed, unweighted networks while promoting a certain combination of motifs is presented. This motif-based network algorithm starts with an empty graph and randomly connects the nodes by advancing or discouraging the formation of chosen motifs. The in- or out-degree distribution of the generated networks can be explicitly chosen. The algorithm is shown to perform well in producing networks with high occurrences of the targeted motifs, both ones consisting of 3 nodes as well as ones consisting of 4 nodes. Moreover, the algorithm can also be tuned to bring about global network characteristics found in many natural networks, such as small-worldness and modularity.

  16. Dynamic Motifs of Strategies in Prisoner's Dilemma Games

    Kim, Young Jin; Jeong, Seon-Young; Son, Seung-Woo

    2014-01-01

    We investigate the win-lose relations between strategies of iterated prisoner's dilemma games by using a directed network concept to display the replicator dynamics results. In the giant strongly-connected component of the win/lose network, we find win-lose circulations similar to rock-paper-scissors and analyze the fixed point and its stability. Applying the network motif concept, we introduce dynamic motifs, which describe the population dynamics relations among the three strategies. Through exact enumeration, we find 22 dynamic motifs and display their phase portraits. Visualization using directed networks and motif analysis is a useful method to make complex dynamic behavior simple in order to understand it more intuitively. Dynamic motifs can be building blocks for dynamic behavior among strategies when they are applied to other types of games.

  17. Automatic annotation of protein motif function with Gene Ontology terms

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  18. BlockLogo: visualization of peptide and sequence motif conservation.

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian; Sun, Jing; Schönbach, Christian; Reinherz, Ellis L; Zhang, Guang Lan; Brusic, Vladimir

    2013-12-31

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://met-hilab.bu.edu/blocklogo/. PMID:24001880

  19. MISAE: a new approach for regulatory motif extraction.

    Sun, Zhaohui; Yang, Jingyi; Deogun, Jitender S

    2004-01-01

    The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on "corrupted" data sets. It is able to extract the motif from a "corrupted" data set with less than one fourth of the sequences containing the real motif. PMID:16448011

  20. Computational analyses of synergism in small molecular network motifs.

    Yili Zhang

    2014-03-01

    Full Text Available Cellular functions and responses to stimuli are controlled by complex regulatory networks that comprise a large diversity of molecular components and their interactions. However, achieving an intuitive understanding of the dynamical properties and responses to stimuli of these networks is hampered by their large scale and complexity. To address this issue, analyses of regulatory networks often focus on reduced models that depict distinct, reoccurring connectivity patterns referred to as motifs. Previous modeling studies have begun to characterize the dynamics of small motifs, and to describe ways in which variations in parameters affect their responses to stimuli. The present study investigates how variations in pairs of parameters affect responses in a series of ten common network motifs, identifying concurrent variations that act synergistically (or antagonistically to alter the responses of the motifs to stimuli. Synergism (or antagonism was quantified using degrees of nonlinear blending and additive synergism. Simulations identified concurrent variations that maximized synergism, and examined the ways in which it was affected by stimulus protocols and the architecture of a motif. Only a subset of architectures exhibited synergism following paired changes in parameters. The approach was then applied to a model describing interlocked feedback loops governing the synthesis of the CREB1 and CREB2 transcription factors. The effects of motifs on synergism for this biologically realistic model were consistent with those for the abstract models of single motifs. These results have implications for the rational design of combination drug therapies with the potential for synergistic interactions.

  1. A speedup technique for (l, d-motif finding algorithms

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  2. Identification of protein superfamily from structure- based sequence motif

    2002-01-01

    The structure-based sequence motif of the distant proteins in evolution, protein tyrosine phosphatases (PTP) Ⅰ and Ⅱ superfamilies, as an example, has been defined by the structural comparison, structure-based sequence alignment and analyses on substitution patterns of residues in common sequence conserved regions. And the phosphatases Ⅰ and Ⅱ can be correctly identified together by the structure-based PTP sequence motif from SWISS-PROT and TrEBML databases. The results show that the correct rates of identification are over 98%. This is the first time to identify PTP Ⅰ and Ⅱ together by this motif.

  3. Coherent feedforward transcriptional regulatory motifs enhance drug resistance

    Charlebois, Daniel A.; Balázsi, Gábor; Kærn, Mads

    2014-05-01

    Fluctuations in gene expression give identical cells access to a spectrum of phenotypes that can serve as a transient, nongenetic basis for natural selection by temporarily increasing drug resistance. In this study, we demonstrate using mathematical modeling and simulation that certain gene regulatory network motifs, specifically coherent feedforward loop motifs, can facilitate the development of nongenetic resistance by increasing cell-to-cell variability and the time scale at which beneficial phenotypic states can be maintained. Our results highlight how regulatory network motifs enabling transient, nongenetic inheritance play an important role in defining reproductive fitness in adverse environments and provide a selective advantage subject to evolutionary pressure.

  4. A Novel Alignment-Free Method for Comparing Transcription Factor Binding Site Motifs

    Minli Xu; Zhengchang Su

    2010-01-01

    BACKGROUND: Transcription factor binding site (TFBS) motifs can be accurately represented by position frequency matrices (PFM) or other equivalent forms. We often need to compare TFBS motifs using their PFMs in order to search for similar motifs in a motif database, or cluster motifs according to their binding preference. The majority of current methods for motif comparison involve a similarity metric for column-to-column comparison and a method to find the optimal position alignment between ...

  5. BayesMD: flexible biological modeling for motif discovery

    Tang, Man-Hung Eric; Krogh, Anders; Winther, Ole

    2008-01-01

    and the marginal probabilities can be used directly to assess the significance of the findings. The framework is benchmarked against other methods on a number of real and artificial data sets. The accompanying prediction server, documentation, software, models and data are available from http://bayesmd.binf.ku.dk/.......We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained...... sampling results are achieved using the advanced sampling method parallel tempering. In a post-analysis step candidate motifs with high marginal probability are found by searching among those motifs that contain sites that occur frequently. Thereby, maximum a posteriori inference for the motifs is avoided...

  6. Review article: The mountain motif in the plot of Matthew

    Gert J. Volschenk

    2010-02-01

    Full Text Available This article reviewed T.L. Donaldson’s book, Jesus on the mountain: A study in Matthean theology, published in 1985 by JSOT Press, Sheffield, and focused on the mountain motif in the structure and plot of the Gospel of Matthew, in addition to the work of Donaldson on the mountain motif as a literary motif and as theological symbol. The mountain is a primary theological setting for Jesus’ ministry and thus is an important setting, serving as one of the literary devices by which Matthew structured and progressed his narrative. The Zion theological and eschatological significance and Second Temple Judaism serve as the historical and theological background for the mountain motif. The last mountain setting (Mt 28:16–20 is the culmination of the three theological themes in the plot of Matthew, namely Christology, ecclesiology and salvation history.

  7. Automatic Network Fingerprinting through Single-Node Motifs

    Echtermeyer, Christoph; Rodrigues, Francisco A; Kaiser, Marcus; 10.1371/journal.pone.0015765

    2011-01-01

    Complex networks have been characterised by their specific connectivity patterns (network motifs), but their building blocks can also be identified and described by node-motifs---a combination of local network features. One technique to identify single node-motifs has been presented by Costa et al. (L. D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett., 87, 1, 2009). Here, we first suggest improvements to the method including how its parameters can be determined automatically. Such automatic routines make high-throughput studies of many networks feasible. Second, the new routines are validated in different network-series. Third, we provide an example of how the method can be used to analyse network time-series. In conclusion, we provide a robust method for systematically discovering and classifying characteristic nodes of a network. In contrast to classical motif analysis, our approach can identify individual components (here: nodes) that are specific to a network. Such special nodes...

  8. BlockLogo: Visualization of peptide and sequence motif conservation

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian;

    2013-01-01

    , selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes...... and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to...... enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular...

  9. Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs

    Lin, Tien-ho; Bar-Joseph, Ziv; Murphy, Robert F.

    2011-01-01

    Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to m...

  10. Mining Tertiary Structural Motifs for Assessment of Designability

    Zhang, Jian; Grigoryan, Gevorg

    2013-01-01

    The observation of a limited secondary-structural alphabet in native proteins, with significant sequence preferences, has profoundly influenced the fields of protein design and structure prediction (Simons et al., 1997; Verschueren et al., 2011). In the era of structural genomics, as the size of the structural dataset continues to grow rapidly, it is becoming possible to extend this analysis to tertiary structural motifs and their sequences. For a hypothetical tertiary motif, the rate of its ...

  11. Temporal Analysis of Motif Mixtures using Dirichlet Processes

    Emonet, Rémi; Varadarajan, J.; Odobez, Jean-Marc

    2014-01-01

    International audience In this paper, we present a new model for unsupervised discovery of recurrent temporal patterns (or motifs) in time series (or documents). The model is designed to handle the difficult case of multivariate time series obtained from a mixture of activities, that is, our observations are caused by the superposition of multiple phenomena occurring concurrently and with no synchronization. The model uses nonparametric Bayesian methods to describe both the motifs and thei...

  12. Robust and Adaptive MicroRNA-Mediated Incoherent Feedforward Motifs

    XU Feng-Dan; LIU Zeng-Rong; ZHANG Zhi-Yong; SHEN Jian-Wei

    2009-01-01

    We integrate transcriptional and post-transcriptional regulation into microRNA-mediated incoherent feedforward motifs and analyse their dynamical behaviour and functions. The analysis show that the behaviour of the system is almost uninfluenced by the varying input in certain ranges and by introducing of delay and noise. The results indicate that microRNA-mediated incoherent feedforward motifs greatly enhance the robustness of gene regulation.

  13. Triplex-induced recombination and repair in the pyrimidine motif

    Kalish, Jennifer M.; Seidman, Michael M.; Weeks, Daniel L.; Glazer, Peter M.

    2005-01-01

    Triplex-forming oligonucleotides (TFOs) bind DNA in a sequence-specific manner at polypurine/polypyrimidine sites and mediate targeted genome modification. Triplexes are formed by either pyrimidine TFOs, which bind parallel to the purine strand of the duplex (pyrimidine, parallel motif), or purine TFOs, which bind in an anti-parallel orientation (purine, anti-parallel motif). Both purine and pyrimidine TFOs, when linked to psoralen, have been shown to direct psoralen adduct formation in cells...

  14. Robust and Adaptive MicroRNA-Mediated Incoherent Feedforward Motifs

    Xu, Feng-Dan; Liu, Zeng-Rong; Zhang, Zhi-Yong; Shen, Jian-Wei

    2009-02-01

    We integrate transcriptional and post-transcriptional regulation into microRNA-mediated incoherent feedforward motifs and analyse their dynamical behaviour and functions. The analysis show that the behaviour of the system is almost uninfluenced by the varying input in certain ranges and by introducing of delay and noise. The results indicate that microRNA-mediated incoherent feedforward motifs greatly enhance the robustness of gene regulation.

  15. Cross-Disciplinary Detection and Analysis of Network Motifs

    Ngoc Tam L. Tran; Luke DeLuccia; McDonald, Aidan F; Chun-Hsi Huang

    2015-01-01

    The detection of network motifs has recently become an important part of network analysis across all disciplines. In this work, we detected and analyzed network motifs from undirected and directed networks of several different disciplines, including biological network, social network, ecological network, as well as other networks such as airlines, power grid, and co-purchase of political books networks. Our analysis revealed that undirected networks are similar at the basic three and four nod...

  16. Robust and Adaptive MicroRNA-Mediated Incoherent Feedforward Motifs

    We integrate transcriptional and post-transcriptional regulation into microRNA-mediated incoherent feedforward motifs and analyse their dynamical behaviour and functions. The analysis show that the behaviour of the system is almost uninfluenced by the varying input in certain ranges and by introducing of delay and noise. The results indicate that microRNA-mediated incoherent feedforward motifs greatly enhance the robustness of gene regulation

  17. Interpretation of Konya Seljuk Carpet Motifs in Exlibrises

    SÜRMELİ, Kader

    2013-01-01

    Traditional Turkish carpets’ technical characteristics with strong motifs and knotting technique, have been its greatest support in regular and continuing development which carry on to survive today. The discovery of Gordes - Turkish knotting technique was born from the need of a nomadic tribe to find a thicker and warmer floor material. Located in Konya, the center of the Anatolian Seljuks, Seljuk carpets with their rich color tones and motifs were the ones where this technique was applied ...

  18. Motif depletion in bacteriophages infecting hosts with CRISPR systems

    Kupczok, Anne; Bollback, Jonathan P

    2014-01-01

    Background CRISPR is a microbial immune system likely to be involved in host-parasite coevolution. It functions using target sequences encoded by the bacterial genome, which interfere with invading nucleic acids using a homology-dependent system. The system also requires protospacer associated motifs (PAMs), short motifs close to the target sequence that are required for interference in CRISPR types I and II. Here, we investigate whether PAMs are depleted in phage genomes due to selection pre...

  19. Assessing the effects of symmetry on motif discovery and modeling.

    Lala M Motlhabi

    Full Text Available BACKGROUND: Identifying the DNA binding sites for transcription factors is a key task in modeling the gene regulatory network of a cell. Predicting DNA binding sites computationally suffers from high false positives and false negatives due to various contributing factors, including the inaccurate models for transcription factor specificity. One source of inaccuracy in the specificity models is the assumption of asymmetry for symmetric models. METHODOLOGY/PRINCIPAL FINDINGS: Using simulation studies, so that the correct binding site model is known and various parameters of the process can be systematically controlled, we test different motif finding algorithms on both symmetric and asymmetric binding site data. We show that if the true binding site is asymmetric the results are unambiguous and the asymmetric model is clearly superior to the symmetric model. But if the true binding specificity is symmetric commonly used methods can infer, incorrectly, that the motif is asymmetric. The resulting inaccurate motifs lead to lower sensitivity and specificity than would the correct, symmetric models. We also show how the correct model can be obtained by the use of appropriate measures of statistical significance. CONCLUSIONS/SIGNIFICANCE: This study demonstrates that the most commonly used motif-finding approaches usually model symmetric motifs incorrectly, which leads to higher than necessary false prediction errors. It also demonstrates how alternative motif-finding methods can correct the problem, providing more accurate motif models and reducing the errors. Furthermore, it provides criteria for determining whether a symmetric or asymmetric model is the most appropriate for any experimental dataset.

  20. In vivo analysis of Caenorhabditis elegans noncoding RNA promoter motifs

    Zheng Haixia

    2008-08-01

    Full Text Available Abstract Background Noncoding RNAs (ncRNAs play important roles in a variety of cellular processes. Characterizing the transcriptional activity of ncRNA promoters is therefore a critical step toward understanding the complex cellular roles of ncRNAs. Results Here we present an in vivo transcriptional analysis of three C. elegans ncRNA upstream motifs (UM1-3. Transcriptional activity of all three motifs has been demonstrated, and mutational analysis revealed differential contributions of different parts of each motif. We showed that upstream motif 1 (UM1 can drive the expression of green fluorescent protein (GFP, and utilized this for detailed analysis of temporal and spatial expression patterns of 5 SL2 RNAs. Upstream motifs 2 and 3 do not drive GFP expression, and termination at consecutive T runs suggests transcription by RNA polymerase III. The UM2 sequence resembles the tRNA promoter, and is actually embedded within its own short-lived, primary transcript. This is a structure which is also found at a few plant and yeast loci, and may indicate an evolutionarily very old dicistronic transcription pattern in which a tRNA serves as a promoter for an adjacent snoRNA. Conclusion The study has demonstrated that the three upstream motifs UM1-3 have promoter activity. The UM1 sequence can drive expression of GFP, which allows for the use of UM1::GFP fusion constructs to study temporal-spatial expression patterns of UM1 ncRNA loci. The UM1 loci appear to act in concert with other upstream sequences, whereas the transcriptional activities of the UM2 and UM3 are confined to the motifs themselves.

  1. Mining tertiary structural motifs for assessment of designability.

    Zhang, Jian; Grigoryan, Gevorg

    2013-01-01

    The observation of a limited secondary-structural alphabet in native proteins, with significant sequence preferences, has profoundly influenced the fields of protein design and structure prediction (Simons, Kooperberg, Huang, & Baker, 1997; Verschueren et al., 2011). In the era of structural genomics, as the size of the structural dataset continues to grow rapidly, it is becoming possible to extend this analysis to tertiary structural motifs and their sequences. For a hypothetical tertiary motif, the rate of its utilization in natural proteins may be used to assess its designability-the ease with which the motif can be realized with natural amino acids. This requires a structural similarity search methodology, which rather than looking for global topological agreement (more appropriate for categorization of full proteins or domains), identifies detailed geometric matches. In this chapter, we introduce such a method, called MaDCaT, and demonstrate its use by assessing the designability landscapes of two tertiary structural motifs. We also show that such analysis can establish structure/sequence links by providing the sequence constraints necessary to encode designable motifs. As logical extension of their secondary-structure counterparts, tertiary structural preferences will likely prove extremely useful in de novo protein design and structure prediction. PMID:23422424

  2. D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs

    Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

    2009-01-01

    Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based o...

  3. Robustness to noise in synchronization of network motifs: Experimental results

    Buscarino, Arturo; Fortuna, Luigi; Frasca, Mattia; Iachello, Marco; Pham, Viet-Thanh

    2012-12-01

    In this work, we experimentally investigate the robustness to noise of synchronization in all the four-nodes network motifs. The experimental setup consists of four Chua's circuits diffusively coupled in order to implement the six different undirected network motifs that can be obtained with four nodes. In this experimental setup, synchronization in the presence of noise injected in one of the network nodes is investigated and network motifs are compared in terms of the synchronization error obtained. The analysis has been then extended to some selected case studies of networks with five and six nodes. Numerical simulations have been also performed and results in agreement with experiments have been obtained. A correlation between node degree and robustness to noise has been found also in these networks.

  4. Motifs in Triadic Random Graphs based on Steiner Triple Systems

    Winkler, Marco

    2013-01-01

    Conventionally, pairwise relationships between nodes are considered to be the fundamental building blocks of complex networks. However, over the last decade the overabundance of certain sub-network patterns, so called motifs, has attracted high attention. It has been hypothesized, these motifs, instead of links, serve as the building blocks of network structures. Although the relation between a network's topology and the general properties of the system, such as its function, its robustness against perturbations, or its efficiency in spreading information is the central theme of network science, there is still a lack of sound generative models needed for testing the functional role of subgraph motifs. Our work aims to overcome this limitation. We employ the framework of exponential random graphs (ERGMs) to define novel models based on triadic substructures. The fact that only a small portion of triads can actually be set independently poses a challenge for the formulation of such models. To overcome this obst...

  5. How pathogens use linear motifs to perturb host cell networks

    Via, Allegra

    2015-01-01

    Molecular mimicry is one of the powerful stratagems that pathogens employ to colonise their hosts and take advantage of host cell functions to guarantee their replication and dissemination. In particular, several viruses have evolved the ability to interact with host cell components through protein short linear motifs (SLiMs) that mimic host SLiMs, thus facilitating their internalisation and the manipulation of a wide range of cellular networks. Here we present convincing evidence from the literature that motif mimicry also represents an effective, widespread hijacking strategy in prokaryotic and eukaryotic parasites. Further insights into host motif mimicry would be of great help in the elucidation of the molecular mechanisms behind host cell invasion and the development of anti-infective therapeutic strategies.

  6. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  7. Specific RNA self-assembly with minimal paranemic motifs

    Afonin, Kirill A.; Cieply, Dennis J.; Leontis, Neocles B.

    2016-01-01

    The paranemic crossover (PX) is a motif for assembling two nucleic acid molecules using Watson-Crick (WC) basepairing without unfolding pre-formed secondary structure in the individual molecules. Once formed, the paranemic assembly motif comprises adjacent parallel double helices that cross over at every possible point over the length of the motif. The interaction is reversible as it does not require denaturation of basepairs internal to each interacting molecular unit. Paranemic assembly has been demonstrated for DNA but not for RNA, and only for motifs with four or more cross-over points and lengths of five or more helical half-turns. Here we report the design of RNA molecules that paranemically assemble with the minimum number of two cross-overs spanning the major groove to form paranemic motifs with a length of three half-turns (3HT). Dissociation constants (Kds) were measured for series of molecules in which the number of basepairs between the cross-over points was varied from five to eight basepairs. The paranemic 3HT complex with six basepairs (3HT_6M) was found to be the most stable with Kd = 1×10−8 M. The half-time for kinetic exchange of the 3HT_6M complex was determined to be ~100 minutes, from which we calculated association and dissociation rate constants ka = 5.11×103 M−1sec−1 and kd = 5.11×10−5 sec−1. RNA paranemic assembly of 3HT and 5HT complexes is blocked by single-base substitutions that disrupt individual inter-molecular Watson-Crick basepairs and is restored by compensatory substitutions that restore those basepairs. The 3HT motif appears suitable for specific, programmable, and reversible tecto-RNA self-assembly for constructing artificial RNA molecular machines. PMID:18072767

  8. Some results on more flexible versions of Graph Motif

    Rizzi, Romeo

    2012-01-01

    The problems studied in this paper originate from Graph Motif, a problem introduced in 2006 in the context of biological networks. Informally speaking, it consists in deciding if a multiset of colors occurs in a connected subgraph of a vertex-colored graph. Due to the high rate of noise in the biological data, more flexible definitions of the problem have been outlined. We present in this paper two inapproximability results for two different optimization variants of Graph Motif. We also study another definition of the problem, when the connectivity constraint is replaced by modularity. While the problem stays NP-complete, it allows algorithms in FPT for biologically relevant parameterizations.

  9. Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters

    Farré Domènec

    2007-12-01

    Full Text Available Abstract Background The arrangement of regulatory motifs in gene promoters, or promoter architecture, is the result of mutation and selection processes that have operated over many millions of years. In mammals, tissue-specific transcriptional regulation is related to the presence of specific protein-interacting DNA motifs in gene promoters. However, little is known about the relative location and spacing of these motifs. To fill this gap, we have performed a systematic search for motifs that show significant bias at specific promoter locations in a large collection of housekeeping and tissue-specific genes. Results We observe that promoters driving housekeeping gene expression are enriched in particular motifs with strong positional bias, such as YY1, which are of little relevance in promoters driving tissue-specific expression. We also identify a large number of motifs that show positional bias in genes expressed in a highly tissue-specific manner. They include well-known tissue-specific motifs, such as HNF1 and HNF4 motifs in liver, kidney and small intestine, or RFX motifs in testis, as well as many potentially novel regulatory motifs. Based on this analysis, we provide predictions for 559 tissue-specific motifs in mouse gene promoters. Conclusion The study shows that motif positional bias is an important feature of mammalian proximal promoters and that it affects both general and tissue-specific motifs. Motif positional constraints define very distinct promoter architectures depending on breadth of expression and type of tissue.

  10. How curved membranes recruit amphipathic helices and protein anchoring motifs

    Hatzakis, Nikos; Bhatia, Vikram Kjøller; Larsen, Jannik;

    2009-01-01

    : membrane-anchored proteins. The fact that unrelated structural motifs such as alpha-helices and alkyl chains sense MC led us to propose that MC sensing is a generic property of curved membranes rather than a property of the anchoring molecules. We therefore anticipate that MC will promote the...... redistribution of proteins that are anchored in membranes through other types of hydrophobic moieties....

  11. Predicting conserved protein motifs with Sub-HMMs

    Girke Thomas

    2010-04-01

    Full Text Available Abstract Background Profile HMMs (hidden Markov models provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in enzymes or ligand binding signatures of receptor proteins. Results To identify these conserved motifs efficiently, we propose a method for extracting the most information-rich regions in protein families from their profile HMMs. The method was used here to predict a comprehensive set of sub-HMMs from the Pfam domain database. Cross-validations with the PROSITE and CSA databases confirmed the efficiency of the method in predicting most of the known functionally relevant motifs and residues. At the same time, 46,768 novel conserved regions could be predicted. The data set also allowed us to link at least 461 Pfam domains of known and unknown function by their common sub-HMMs. Finally, the sub-HMM method showed very promising results as an alternative search method for identifying proteins that share only short sequence similarities. Conclusions Sub-HMMs extend the application spectrum of profile HMMs to motif discovery. Their most interesting utility is the identification of the functionally relevant residues in proteins of known and unknown function. Additionally, sub-HMMs can be used for highly localized sequence similarity searches that focus on shorter conserved features rather than entire domains or global similarities. The motif data generated by this study is a valuable knowledge resource for characterizing protein functions in the future.

  12. Motivated Proteins: A web application for studying small three-dimensional protein motifs

    Milner-White E James

    2009-02-01

    Full Text Available Abstract Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (XHTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema.

  13. Composite motifs integrating multiple protein structures increase sensitivity for function prediction.

    Chen, Brian Y; Bryant, Drew H; Cruess, Amanda E; Bylund, Joseph H; Fofanov, Viacheslav Y; Kristensen, David M; Kimmel, Marek; Lichtarge, Olivier; Kavraki, Lydia E

    2007-01-01

    The study of disease often hinges on the biological function of proteins, but determining protein function is a difficult experimental process. To minimize duplicated effort, algorithms for function prediction seek characteristics indicative of possible protein function. One approach is to identify substructural matches of geometric and chemical similarity between motifs representing known active sites and target protein structures with unknown function. In earlier work, statistically significant matches of certain effective motifs have identified functionally related active sites. Effective motifs must be carefully designed to maintain similarity to functionally related sites (sensitivity) and avoid incidental similarities to functionally unrelated protein geometry (specificity). Existing motif design techniques use the geometry of a single protein structure. Poor selection of this structure can limit motif effectiveness if the selected functional site lacks similarity to functionally related sites. To address this problem, this paper presents composite motifs, which combine structures of functionally related active sites to potentially increase sensitivity. Our experimentation compares the effectiveness of composite motifs with simple motifs designed from single protein structures. On six distinct families of functionally related proteins, leave-one-out testing showed that composite motifs had sensitivity comparable to the most sensitive of all simple motifs and specificity comparable to the average simple motif. On our data set, we observed that composite motifs simultaneously capture variations in active site conformation, diminish the problem of selecting motif structures, and enable the fusion of protein structures from diverse data sources. PMID:17951837

  14. Functional characterization of transcription factor motifs using cross-species comparison across large evolutionary distances

    Jaebum Kim; Ryan Cunningham; Brian James; Stefan Wyder; Gibson, Joshua D.; Oliver Niehuis; Zdobnov, Evgeny M.; Hugh M Robertson; Robinson, Gene E.; Werren, John H; Saurabh Sinha

    2010-01-01

    We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif-function associations. This framework is app...

  15. Selection against spurious promoter motifs correlates with translational efficiency across bacteria

    Froula, Jeffrey L.; M. Pilar Francino

    2008-01-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the sigma(70) subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory s...

  16. SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents

    Zhang, Shaoqiang; Zhou, Xiguo; Du, Chuanbin; Su, Zhengchang

    2013-01-01

    Background Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar ...

  17. RNAMotifScanX: a graph alignment approach for RNA structural motif identification

    Zhong, Cuncong; Zhang, Shaojie

    2015-01-01

    RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexit...

  18. Distinct configurations of protein complexes and biochemical pathways revealed by epistatic interaction network motifs

    Casey, Fergal

    2011-08-22

    Abstract Background Gene and protein interactions are commonly represented as networks, with the genes or proteins comprising the nodes and the relationship between them as edges. Motifs, or small local configurations of edges and nodes that arise repeatedly, can be used to simplify the interpretation of networks. Results We examined triplet motifs in a network of quantitative epistatic genetic relationships, and found a non-random distribution of particular motif classes. Individual motif classes were found to be associated with different functional properties, suggestive of an underlying biological significance. These associations were apparent not only for motif classes, but for individual positions within the motifs. As expected, NNN (all negative) motifs were strongly associated with previously reported genetic (i.e. synthetic lethal) interactions, while PPP (all positive) motifs were associated with protein complexes. The two other motif classes (NNP: a positive interaction spanned by two negative interactions, and NPP: a negative spanned by two positives) showed very distinct functional associations, with physical interactions dominating for the former but alternative enrichments, typical of biochemical pathways, dominating for the latter. Conclusion We present a model showing how NNP motifs can be used to recognize supportive relationships between protein complexes, while NPP motifs often identify opposing or regulatory behaviour between a gene and an associated pathway. The ability to use motifs to point toward underlying biological organizational themes is likely to be increasingly important as more extensive epistasis mapping projects in higher organisms begin.

  19. WildSpan: mining structured motifs from protein sequences

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  20. A new motif for inhibitors of geranylgeranyl diphosphate synthase.

    Foust, Benjamin J; Allen, Cheryl; Holstein, Sarah A; Wiemer, David F

    2016-08-15

    The enzyme geranylgeranyl diphosphate synthase (GGDPS) is believed to receive the substrate farnesyl diphosphate through one lipophilic channel and release the product geranylgeranyl diphosphate through another. Bisphosphonates with two isoprenoid chains positioned on the α-carbon have proven to be effective inhibitors of this enzyme. Now a new motif has been prepared with one isoprenoid chain on the α-carbon, a second included as a phosphonate ester, and the potential for a third at the α-carbon. The pivaloyloxymethyl prodrugs of several compounds based on this motif have been prepared and the resulting compounds have been tested for their ability to disrupt protein geranylgeranylation and induce cytotoxicity in myeloma cells. The initial biological studies reveal activity consistent with GGDPS inhibition, and demonstrate a structure-function relationship which is dependent on the nature of the alkyl group at the α-carbon. PMID:27338660

  1. Discovering sequence motifs in quantitative and qualitative pepetide data

    Andreatta, Massimo

    analyze and interpret such data. The first paper in this thesis presents a new, publicly available method based on artificial neural networks that allows custom analysis of quantitative peptide data. The online NNAlign web-server provides a simple yet powerful tool for the discovery of sequence motifs in...... thousands of interactions in a single experiment, with virtually unlimited choice of potential targets and variants of these targets. However, the amount and complexity of data produced by high-throughput techniques poses serious challenges to researchers of limited bioinformatics expertise who need to...... this thesis deals with the presence of multiple motifs, due to the experimental setup or the actual poly-specificity of the receptor, in peptide data. A new algorithm, based on Gibbs sampling, identifies multiple specificities by performing two tasks simultaneously: alignment and clustering of peptide...

  2. Nephila clavipes Flagelliform Silk-like GGX Motifs Contribute to Extensibility and Spacer Motifs Contribute to Strength in Synthetic Spider Silk Fibers

    Adrianos, Sherry L.; Teulé, Florence; Hinman, Michael B.; Jones, Justin A.; Weber, Warner S.; Yarger, Jeffery L.; Lewis, Randolph V.

    2013-01-01

    Flagelliform spider silk is the most extensible silk fiber produced by orb weaver spiders, though not as strong as the dragline silk of the spider. The motifs found in the core of the Nephila clavipes flagelliform Flag protein are: GGX, spacer, and GPGGX. Flag does not contain the polyalanine motif known to provide the strength of dragline silk. To investigate the source of flagelliform fiber strength, four recombinant proteins were produced containing variations of the three core motifs of t...

  3. Defense-Inducing Volatiles: In Search of the Active Motif

    Heil, Martin; Lion, Ulrich; Boland, Wilhelm

    2008-01-01

    Herbivore-induced volatile organic compounds (VOCs) are widely appreciated as an indirect defense mechanism since carnivorous arthropods use VOCs as cues for host localization and then attack herbivores. Another function of VOCs is plant–plant signaling. That VOCs elicit defensive responses in neighboring plants has been reported from various species, and different compounds have been found to be active. In order to search for a structural motif that characterizes active VOCs, we used lima be...

  4. Graph animals, subgraph sampling, and motif search in large networks

    Baskerville, Kim; Grassberger, Peter; Paczuski, Maya

    2007-09-01

    We generalize a sampling algorithm for lattice animals (connected clusters on a regular lattice) to a Monte Carlo algorithm for “graph animals,” i.e., connected subgraphs in arbitrary networks. As with the algorithm in [N. Kashtan , Bioinformatics 20, 1746 (2004)], it provides a weighted sample, but the computation of the weights is much faster (linear in the size of subgraphs, instead of superexponential). This allows subgraphs with up to ten or more nodes to be sampled with very high statistics, from arbitrarily large networks. Using this together with a heuristic algorithm for rapidly classifying isomorphic graphs, we present results for two protein interaction networks obtained using the tandem affinity purification (TAP) method: one of Escherichia coli with 230 nodes and 695 links, and one for yeast (Saccharomyces cerevisiae) with roughly ten times more nodes and links. We find in both cases that most connected subgraphs are strong motifs ( Z scores >10 ) or antimotifs ( Z scores <-10 ) when the null model is the ensemble of networks with fixed degree sequence. Strong differences appear between the two networks, with dominant motifs in E. coli being (nearly) bipartite graphs and having many pairs of nodes that connect to the same neighbors, while dominant motifs in yeast tend towards completeness or contain large cliques. We also explore a number of methods that do not rely on measurements of Z scores or comparisons with null models. For instance, we discuss the influence of specific complexes like the 26S proteasome in yeast, where a small number of complexes dominate the k cores with large k and have a decisive effect on the strongest motifs with 6-8 nodes. We also present Zipf plots of counts versus rank. They show broad distributions that are not power laws, in contrast to the case when disconnected subgraphs are included.

  5. Exon silencing by UAGG motifs in response to neuronal excitation.

    Ping An

    2007-02-01

    Full Text Available Alternative pre-mRNA splicing plays fundamental roles in neurons by generating functional diversity in proteins associated with the communication and connectivity of the synapse. The CI cassette of the NMDA R1 receptor is one of a variety of exons that show an increase in exon skipping in response to cell excitation, but the molecular nature of this splicing responsiveness is not yet understood. Here we investigate the molecular basis for the induced changes in splicing of the CI cassette exon in primary rat cortical cultures in response to KCl-induced depolarization using an expression assay with a tight neuron-specific readout. In this system, exon silencing in response to neuronal excitation was mediated by multiple UAGG-type silencing motifs, and transfer of the motifs to a constitutive exon conferred a similar responsiveness by gain of function. Biochemical analysis of protein binding to UAGG motifs in extracts prepared from treated and mock-treated cortical cultures showed an increase in nuclear hnRNP A1-RNA binding activity in parallel with excitation. Evidence for the role of the NMDA receptor and calcium signaling in the induced splicing response was shown by the use of specific antagonists, as well as cell-permeable inhibitors of signaling pathways. Finally, a wider role for exon-skipping responsiveness is shown to involve additional exons with UAGG-related silencing motifs, and transcripts involved in synaptic functions. These results suggest that, at the post-transcriptional level, excitable exons such as the CI cassette may be involved in strategies by which neurons mount adaptive responses to hyperstimulation.

  6. Nature-inspired design of motif-specific antibody scaffolds

    Koerber, James T.; Thomsen, Nathan D.; Hannigan, Brett T.; DeGrado, William F.; Wells, James A.

    2013-01-01

    Aberrant changes in post-translational modifications (PTMs) such as phosphorylation underlie a majority of human diseases. However, detection and quantification of PTMs for diagnostic or biomarker applications often requires monoclonal PTM-specific antibodies, which are challenging to generate using traditional antibody-generation platforms. Here we outline a general strategy for producing synthetic PTM-specific antibodies by engineering a motif-specific ‘hot spot’ into an antibody scaffold. ...

  7. The ADAMTS (A Disintegrin and Metalloproteinase with Thrombospondin motifs) family

    Kelwick, Richard; Desanlis, Ines; Wheeler, Grant N.; Edwards, Dylan R

    2015-01-01

    The ADAMTS (A Disintegrin and Metalloproteinase with Thrombospondin motifs) enzymes are secreted, multi-domain matrix-associated zinc metalloendopeptidases that have diverse roles in tissue morphogenesis and patho-physiological remodeling, in inflammation and in vascular biology. The human family includes 19 members that can be sub-grouped on the basis of their known substrates, namely the aggrecanases or proteoglycanases (ADAMTS1, 4, 5, 8, 9, 15 and 20), the procollagen N-propeptidases (ADAM...

  8. Tricksters Trot to America: Areal Distribution of Folklore Motifs

    Yuri Berezkin

    2010-01-01

    The folklore Trickster is usually considered a universally known combination of features intrinsic to human nature. However, there are strong anomalies in the areal distribution of such a figure. Sub-Saharan Africa, North America (except for the Arctic), Northeast Asia and South American Chaco not only are the preferred zones of tricksters’ activity but also share some peculiar trickster motifs unknown in most of the other regions. The range of animals which play the role of tricksters is als...

  9. Tools and resources for identifying protein families, domains and motifs

    Mulder, Nicola J.; Apweiler, Rolf

    2001-01-01

    With the large influx of raw sequence data from genome sequencing projects, there is a need for reliable automatic methods for protein sequence analysis and classification. The most useful tools use various methods for identifying motifs or domains found in previously characterized protein families. This article reviews the tools and resources available on the web for identifying signatures within proteins and discusses how they may be used in the analysis of new or unknown protein sequences.

  10. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

    Seitzer Phillip

    2012-11-01

    Full Text Available Abstract Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF binding site motif

  11. QuateXelero: an accelerated exact network motif detection algorithm.

    Khakabimamaghani, Sahand; Sharafuddin, Iman; Dichter, Norbert; Koch, Ina; Masoudi-Nejad, Ali

    2013-01-01

    Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks' structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network. PMID:23874498

  12. Event Networks and the Identification of Crime Pattern Motifs.

    Davies, Toby; Marchione, Elio

    2015-01-01

    In this paper we demonstrate the use of network analysis to characterise patterns of clustering in spatio-temporal events. Such clustering is of both theoretical and practical importance in the study of crime, and forms the basis for a number of preventative strategies. However, existing analytical methods show only that clustering is present in data, while offering little insight into the nature of the patterns present. Here, we show how the classification of pairs of events as close in space and time can be used to define a network, thereby generalising previous approaches. The application of graph-theoretic techniques to these networks can then offer significantly deeper insight into the structure of the data than previously possible. In particular, we focus on the identification of network motifs, which have clear interpretation in terms of spatio-temporal behaviour. Statistical analysis is complicated by the nature of the underlying data, and we provide a method by which appropriate randomised graphs can be generated. Two datasets are used as case studies: maritime piracy at the global scale, and residential burglary in an urban area. In both cases, the same significant 3-vertex motif is found; this result suggests that incidents tend to occur not just in pairs, but in fact in larger groups within a restricted spatio-temporal domain. In the 4-vertex case, different motifs are found to be significant in each case, suggesting that this technique is capable of discriminating between clustering patterns at a finer granularity than previously possible. PMID:26605544

  13. GxxxG motifs hold the TIM23 complex together.

    Demishtein-Zohary, Keren; Marom, Milit; Neupert, Walter; Mokranjac, Dejana; Azem, Abdussalam

    2015-06-01

    Approximately 99% of the mitochondrial proteome is nucleus-encoded, synthesized in the cytosol, and subsequently imported into and sorted to the correct compartment in the organelle. The translocase of the inner mitochondrial membrane 23 (TIM23) complex is the major protein translocase of the inner membrane, and is responsible for translocation of proteins across the inner membrane and their insertion into the inner membrane. Tim23 is the central component of the complex that forms the import channel. A high-resolution structure of the import channel is still missing, and structural elements important for its function are unknown. In the present study, we analyzed the importance of the highly abundant GxxxG motifs in the transmembrane segments of Tim23 for the structural integrity of the TIM23 complex. Of 10 glycines present in the GxxxG motifs in the first, second and third transmembrane segments of Tim23, mutations of three of them in transmembrane segments 1 and 2 resulted in a lethal phenotype, and mutations of three others in a temperature-sensitive phenotype. The remaining four caused no obvious growth phenotype. Importantly, none of the mutations impaired the import and membrane integration of Tim23 precursor into mitochondria. However, the severity of growth impairment correlated with the destabilization of the TIM23 complex. We conclude that the GxxxG motifs found in the first and second transmembrane segments of Tim23 are necessary for the structural integrity of the TIM23 complex. PMID:25765297

  14. Interlinking motifs and entropy landscapes of statistically interacting particles

    P. Lu

    2012-03-01

    Full Text Available The s=1/2 Ising chain with uniform nearest-neighbor and next-nearest-neighbor coupling is used to construct a system of floating particles characterized by motifs of up to six consecutive local spins. The spin couplings cause the assembly of particles which, in turn, remain free of interaction energies even at high density. All microstates are configurations of particles from one of three different sets, excited from pseudo-vacua associated with ground states of periodicities one, two, and four. The motifs of particles and elements of pseudo-vacuum interlink in two shared site variables. The statistical interaction between particles is encoded in a generalized Pauli principle, describing how the placement of one particle modifies the options for placing further particles. In the statistical mechanical analysis arbitrary energies can be assigned to all particle species. The entropy is a function of the particle populations. The statistical interaction specifications are transparently built into that expression. The energies and structures of the particles alone govern the ordering at low temperature. Under special circumstances the particles can be replaced by more fundamental particles with shorter motifs that interlink in only one shared site variable. Structures emerge from interactions on two levels: particles with shapes from coupled spins and long-range ordering tendencies from statistically interacting particles with shapes.

  15. Event Networks and the Identification of Crime Pattern Motifs.

    Toby Davies

    Full Text Available In this paper we demonstrate the use of network analysis to characterise patterns of clustering in spatio-temporal events. Such clustering is of both theoretical and practical importance in the study of crime, and forms the basis for a number of preventative strategies. However, existing analytical methods show only that clustering is present in data, while offering little insight into the nature of the patterns present. Here, we show how the classification of pairs of events as close in space and time can be used to define a network, thereby generalising previous approaches. The application of graph-theoretic techniques to these networks can then offer significantly deeper insight into the structure of the data than previously possible. In particular, we focus on the identification of network motifs, which have clear interpretation in terms of spatio-temporal behaviour. Statistical analysis is complicated by the nature of the underlying data, and we provide a method by which appropriate randomised graphs can be generated. Two datasets are used as case studies: maritime piracy at the global scale, and residential burglary in an urban area. In both cases, the same significant 3-vertex motif is found; this result suggests that incidents tend to occur not just in pairs, but in fact in larger groups within a restricted spatio-temporal domain. In the 4-vertex case, different motifs are found to be significant in each case, suggesting that this technique is capable of discriminating between clustering patterns at a finer granularity than previously possible.

  16. Insertion of tetracysteine motifs into dopamine transporter extracellular domains.

    Deanna M Navaroli

    Full Text Available The neuronal dopamine transporter (DAT is a major determinant of extracellular dopamine (DA levels and is the primary target for a variety of addictive and therapeutic psychoactive drugs. DAT is acutely regulated by protein kinase C (PKC activation and amphetamine exposure, both of which modulate DAT surface expression by endocytic trafficking. In order to use live imaging approaches to study DAT endocytosis, methods are needed to exclusively label the DAT surface pool. The use of membrane impermeant, sulfonated biarsenic dyes holds potential as one such approach, and requires introduction of an extracellular tetracysteine motif (tetraCys; CCPGCC to facilitate dye binding. In the current study, we took advantage of intrinsic proline-glycine (Pro-Gly dipeptides encoded in predicted DAT extracellular domains to introduce tetraCys motifs into DAT extracellular loops 2, 3, and 4. [(3H]DA uptake studies, surface biotinylation and fluorescence microscopy in PC12 cells indicate that tetraCys insertion into the DAT second extracellular loop results in a functional transporter that maintains PKC-mediated downregulation. Introduction of tetraCys into extracellular loops 3 and 4 yielded DATs with severely compromised function that failed to mature and traffic to the cell surface. This is the first demonstration of successful introduction of a tetracysteine motif into a DAT extracellular domain, and may hold promise for use of biarsenic dyes in live DAT imaging studies.

  17. Motif decomposition of the phosphotyrosine proteome reveals a new N-terminal binding motif for SHIP2

    Miller, Martin Lee; Hanke, S.; Hinsby, A. M.; Friis, Carsten; Brunak, Søren; Mann, M.; Blom, Nikolaj

    Advances in mass spectrometry-based proteomics have yielded a substantial mapping of the tyrosine phosphoproteome and thus provided an important step toward a systematic analysis of intracellular signaling networks in higher eukaryotes. In this study we decomposed an uncharacterized proteomics data...... set of 481 unique phosphotyrosine (Tyr(P)) peptides by sequence similarity to known ligands of the Src homology 2 (SH2) and the phosphotyrosine binding (PTB) domains. From 20 clusters we extracted 16 known and four new interaction motifs. Using quantitative mass spectrometry we pulled down Tyr...... and validated as a binding motif for the SH2 domain-containing inositol phosphatase SHIP2. Our decomposition of the in vivo Tyr(P) proteome furthermore suggests that two-thirds of the Tyr(P) sites mediate interaction, whereas the remaining third govern processes such as enzyme activation and nucleic...

  18. A Novel Alignment-Free Method for Comparing Transcription Factor Binding Site Motifs

    Xu, Minli; Su, Zhengchang

    2010-01-01

    Background Transcription factor binding site (TFBS) motifs can be accurately represented by position frequency matrices (PFM) or other equivalent forms. We often need to compare TFBS motifs using their PFMs in order to search for similar motifs in a motif database, or cluster motifs according to their binding preference. The majority of current methods for motif comparison involve a similarity metric for column-to-column comparison and a method to find the optimal position alignment between the two compared motifs. In some applications, alignment-free methods might be preferred; however, few such methods with high accuracy have been described. Methodology/Principal Findings Here we describe a novel alignment-free method for quantifying the similarity of motifs using their PFMs by converting PFMs into k-mer vectors. The motifs could then be compared by measuring the similarity among their corresponding k-mer vectors. Conclusions/Significance We demonstrate that our method in general achieves similar performance or outperforms the existing methods for clustering motifs according to their binding preference and identifying similar motifs of transcription factors of the same family. PMID:20098703

  19. A novel alignment-free method for comparing transcription factor binding site motifs.

    Minli Xu

    Full Text Available BACKGROUND: Transcription factor binding site (TFBS motifs can be accurately represented by position frequency matrices (PFM or other equivalent forms. We often need to compare TFBS motifs using their PFMs in order to search for similar motifs in a motif database, or cluster motifs according to their binding preference. The majority of current methods for motif comparison involve a similarity metric for column-to-column comparison and a method to find the optimal position alignment between the two compared motifs. In some applications, alignment-free methods might be preferred; however, few such methods with high accuracy have been described. METHODOLOGY/PRINCIPAL FINDINGS: Here we describe a novel alignment-free method for quantifying the similarity of motifs using their PFMs by converting PFMs into k-mer vectors. The motifs could then be compared by measuring the similarity among their corresponding k-mer vectors. CONCLUSIONS/SIGNIFICANCE: We demonstrate that our method in general achieves similar performance or outperforms the existing methods for clustering motifs according to their binding preference and identifying similar motifs of transcription factors of the same family.

  20. Leucine-based receptor sorting motifs are dependent on the spacing relative to the plasma membrane

    Geisler, C; Dietrich, J; Nielsen, B L;

    1998-01-01

    amino acid, is constitutively active. In this study, we have investigated how the spacing relative to the plasma membrane affects the function of both types of leucine-based motifs. For phosphorylation-dependent leucine-based motifs, a minimal spacing of 7 residues between the plasma membrane and the...... phospho-acceptor was required for phosphorylation and thereby activation of the motifs. For constitutively active leucine-based motifs, a minimal spacing of 6 residues between the plasma membrane and the acidic residue was required for optimal activity of the motifs. In addition, we found that the acidic......Many integral membrane proteins contain leucine-based motifs within their cytoplasmic domains that mediate internalization and intracellular sorting. Two types of leucine-based motifs have been identified. One type is dependent on phosphorylation, whereas the other type, which includes an acidic...

  1. Identification of imine reductase-specific sequence motifs.

    Fademrecht, Silvia; Scheller, Philipp N; Nestl, Bettina M; Hauer, Bernhard; Pleiss, Jürgen

    2016-05-01

    Chiral amines are valuable building blocks for the production of a variety of pharmaceuticals, agrochemicals and other specialty chemicals. Only recently, imine reductases (IREDs) were discovered which catalyze the stereoselective reduction of imines to chiral amines. Although several IREDs were biochemically characterized in the last few years, knowledge of the reaction mechanism and the molecular basis of substrate specificity and stereoselectivity is limited. To gain further insights into the sequence-function relationships, the Imine Reductase Engineering Database (www.IRED.BioCatNet.de) was established and a systematic analysis of 530 putative IREDs was performed. A standard numbering scheme based on R-IRED-Sk was introduced to facilitate the identification and communication of structurally equivalent positions in different proteins. A conservation analysis revealed a highly conserved cofactor binding region and a predominantly hydrophobic substrate binding cleft. Two IRED-specific motifs were identified, the cofactor binding motif GLGxMGx5 [ATS]x4 Gx4 [VIL]WNR[TS]x2 [KR] and the active site motif Gx[DE]x[GDA]x[APS]x3 {K}x[ASL]x[LMVIAG]. Our results indicate a preference toward NADPH for all IREDs and explain why, despite their sequence similarity to β-hydroxyacid dehydrogenases (β-HADs), no conversion of β-hydroxyacids has been observed. Superfamily-specific conservations were investigated to explore the molecular basis of their stereopreference. Based on our analysis and previous experimental results on IRED mutants, an exclusive role of standard position 187 for stereoselectivity is excluded. Alternatively, two standard positions 139 and 194 were identified which are superfamily-specifically conserved and differ in R- and S-selective enzymes. Proteins 2016; 84:600-610. © 2016 Wiley Periodicals, Inc. PMID:26857686

  2. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  3. Do short, frequent DNA sequence motifs mould the epigenome?

    Quante, Timo; Bird, Adrian

    2016-04-01

    'Epigenome' refers to the panoply of chemical modifications borne by DNA and its associated proteins that locally affect genome function. Epigenomic patterns are thought to be determined by external constraints resulting from development, disease and the environment, but DNA sequence is also a potential influence. We propose that domains of relatively uniform DNA base composition may modulate the epigenome through cell type-specific proteins that recognize short, frequent sequence motifs. Differential recruitment of epigenomic modifiers may adjust gene expression in multigene blocks as an alternative to tuning the activity of each gene separately, thus simplifying gene expression programming. PMID:26837845

  4. Tricksters Trot to America: Areal Distribution of Folklore Motifs

    Yuri Berezkin

    2010-12-01

    Full Text Available The folklore Trickster is usually considered a universally known combination of features intrinsic to human nature. However, there are strong anomalies in the areal distribution of such a figure. Sub-Saharan Africa, North America (except for the Arctic, Northeast Asia and South American Chaco not only are the preferred zones of tricksters’ activity but also share some peculiar trickster motifs unknown in most of the other regions. The range of animals which play the role of tricksters is also restricted and not always easily explained, E.g. the Hare and Spider, known in both Africa and North America, are neither “mediators” between life and death (suggested by C. Lévi-Strauss for Coyote nor “really tricky” (“materialistic” hypothesis of M. Harris. The set of trickster motifs and the zoo- or anthropomorphic impersonations of the Trickster are independentvariables. The same episodes are easily linked to different tricksters while every trickster usually attracts episodes characteristic of a particular region. Though the original emergence of Trickster as a mental construct can indeed be rooted in human psychology (and where else?, the distribution of tricksters in folklore is discretionary and depends of many uncertain, i.e. chance, factors. The wide spread or lack of tricksters in certain cultural areas hardly reflect any fundamental differences in the psychology of inhabitants of these regions. The study of trickster motifs, just as of any other folklore motifs, helps us reconstruct possible historic links between populations. The African – North American links remain enigmatic (independent emergence is possible but slight historicallinks cannot be completely excluded but the parallels between (Western and Northeast Siberian – North American tricksters are almost certainly due to former cultural ties across Northern Asia. Another interesting case is the proliferation of tricksters with different zoomorphic and other identities

  5. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  6. Motif Discovery in Tissue-Specific Regulatory Sequences Using Directed Information

    James Douglas Engel

    2007-12-01

    Full Text Available Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often leads to discovery of novel motifs (including transcription factor sites with previously uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work, we present an approach to the identification of motifs (not necessarily transcription factor sites and examine its application to some questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific gene promoter or regulatory regions from those that are not tissue-specific. There are two main contributions of this work. Firstly, we propose the use of directed information for such classification constrained motif discovery, and then use the selected features with a support vector machine (SVM classifier to find the tissue specificity of any sequence of interest. Such analysis yields several novel interesting motifs that merit further experimental characterization. Furthermore, this approach leads to a principled framework for the prospective examination of any chosen motif to be discriminatory motif for a group of coexpressed/coregulated genes, thereby integrating sequence and expression perspectives. We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue-specific regulatory role of any conserved sequence element identified from genome-wide studies.

  7. Motif Discovery in Tissue-Specific Regulatory Sequences Using Directed Information

    States David

    2007-01-01

    Full Text Available Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often leads to discovery of novel motifs (including transcription factor sites with previously uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work, we present an approach to the identification of motifs (not necessarily transcription factor sites and examine its application to some questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific gene promoter or regulatory regions from those that are not tissue-specific. There are two main contributions of this work. Firstly, we propose the use of directed information for such classification constrained motif discovery, and then use the selected features with a support vector machine (SVM classifier to find the tissue specificity of any sequence of interest. Such analysis yields several novel interesting motifs that merit further experimental characterization. Furthermore, this approach leads to a principled framework for the prospective examination of any chosen motif to be discriminatory motif for a group of coexpressed/coregulated genes, thereby integrating sequence and expression perspectives. We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue-specific regulatory role of any conserved sequence element identified from genome-wide studies.

  8. Automated protein motif generation in the structure-based protein function prediction tool ProMOL.

    Osipovitch, Mikhail; Lambrecht, Mitchell; Baker, Cameron; Madha, Shariq; Mills, Jeffrey L; Craig, Paul A; Bernstein, Herbert J

    2015-12-01

    ProMOL, a plugin for the PyMOL molecular graphics system, is a structure-based protein function prediction tool. ProMOL includes a set of routines for building motif templates that are used for screening query structures for enzyme active sites. Previously, each motif template was generated manually and required supervision in the optimization of parameters for sensitivity and selectivity. We developed an algorithm and workflow for the automation of motif building and testing routines in ProMOL. The algorithm uses a set of empirically derived parameters for optimization and requires little user intervention. The automated motif generation algorithm was first tested in a performance comparison with a set of manually generated motifs based on identical active sites from the same 112 PDB entries. The two sets of motifs were equally effective in identifying alignments with homologs and in rejecting alignments with unrelated structures. A second set of 296 active site motifs were generated automatically, based on Catalytic Site Atlas entries with literature citations, as an expansion of the library of existing manually generated motif templates. The new motif templates exhibited comparable performance to the existing ones in terms of hit rates against native structures, homologs with the same EC and Pfam designations, and randomly selected unrelated structures with a different EC designation at the first EC digit, as well as in terms of RMSD values obtained from local structural alignments of motifs and query structures. This research is supported by NIH grant GM078077. PMID:26573864

  9. Transduction motif analysis of gastric cancer based on a human signaling network

    Liu, G.; Li, D.Z.; Jiang, C.S.; Wang, W. [Fuzhou General Hospital of Nanjing Command, Department of Gastroenterology, Fuzhou, China, Department of Gastroenterology, Fuzhou General Hospital of Nanjing Command, Fuzhou (China)

    2014-04-04

    To investigate signal regulation models of gastric cancer, databases and literature were used to construct the signaling network in humans. Topological characteristics of the network were analyzed by CytoScape. After marking gastric cancer-related genes extracted from the CancerResource, GeneRIF, and COSMIC databases, the FANMOD software was used for the mining of gastric cancer-related motifs in a network with three vertices. The significant motif difference method was adopted to identify significantly different motifs in the normal and cancer states. Finally, we conducted a series of analyses of the significantly different motifs, including gene ontology, function annotation of genes, and model classification. A human signaling network was constructed, with 1643 nodes and 5089 regulating interactions. The network was configured to have the characteristics of other biological networks. There were 57,942 motifs marked with gastric cancer-related genes out of a total of 69,492 motifs, and 264 motifs were selected as significantly different motifs by calculating the significant motif difference (SMD) scores. Genes in significantly different motifs were mainly enriched in functions associated with cancer genesis, such as regulation of cell death, amino acid phosphorylation of proteins, and intracellular signaling cascades. The top five significantly different motifs were mainly cascade and positive feedback types. Almost all genes in the five motifs were cancer related, including EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, TGFBR2, AR, and CASP7. The development of cancer might be curbed by inhibiting signal transductions upstream and downstream of the selected motifs.

  10. Transduction motif analysis of gastric cancer based on a human signaling network

    To investigate signal regulation models of gastric cancer, databases and literature were used to construct the signaling network in humans. Topological characteristics of the network were analyzed by CytoScape. After marking gastric cancer-related genes extracted from the CancerResource, GeneRIF, and COSMIC databases, the FANMOD software was used for the mining of gastric cancer-related motifs in a network with three vertices. The significant motif difference method was adopted to identify significantly different motifs in the normal and cancer states. Finally, we conducted a series of analyses of the significantly different motifs, including gene ontology, function annotation of genes, and model classification. A human signaling network was constructed, with 1643 nodes and 5089 regulating interactions. The network was configured to have the characteristics of other biological networks. There were 57,942 motifs marked with gastric cancer-related genes out of a total of 69,492 motifs, and 264 motifs were selected as significantly different motifs by calculating the significant motif difference (SMD) scores. Genes in significantly different motifs were mainly enriched in functions associated with cancer genesis, such as regulation of cell death, amino acid phosphorylation of proteins, and intracellular signaling cascades. The top five significantly different motifs were mainly cascade and positive feedback types. Almost all genes in the five motifs were cancer related, including EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, TGFBR2, AR, and CASP7. The development of cancer might be curbed by inhibiting signal transductions upstream and downstream of the selected motifs

  11. Motif decomposition of the phosphotyrosine proteome reveals a new N-terminal binding motif for SHIP2

    Miller, Martin Lee; Hanke, S.; Hinsby, A. M.; Friis, Carsten; Brunak, Søren; Mann, M.; Blom, Nikolaj

    Advances in mass spectrometry-based proteomics have yielded a substantial mapping of the tyrosine phosphoproteome and thus provided an important step toward a systematic analysis of intracellular signaling networks in higher eukaryotes. In this study we decomposed an uncharacterized proteomics data...... and validated as a binding motif for the SH2 domain-containing inositol phosphatase SHIP2. Our decomposition of the in vivo Tyr(P) proteome furthermore suggests that two-thirds of the Tyr(P) sites mediate interaction, whereas the remaining third govern processes such as enzyme activation and nucleic...

  12. The Origin of Motif Families in Food Webs

    Klaise, Janis

    2016-01-01

    Food webs have been found to exhibit remarkable motif profiles, patterns in the relative prevalences of all possible three-species sub-graphs, and this has been related to ecosystem properties such as stability and robustness. Analysing 46 food webs of various kinds, we find that most food webs fall into one of two distinct motif families. The separation between the families is well predicted by a global measure of hierarchical order in directed networks - trophic coherence. We find that trophic coherence is also a good predictor for the extent of omnivory, defined as the tendency of species to feed on multiple trophic levels. We compare our results to a network assembly model that admits tunable trophic coherence via a single free parameter. The model is able to generate food webs in either of the two families by varying this parameter, and correctly classifies almost all the food webs in our database. This establishes a link between global order and local preying patterns in food webs.

  13. Proline Rich Motifs as Drug Targets in Immune Mediated Disorders

    Mythily Srinivasan

    2012-01-01

    Full Text Available The current version of the human immunome network consists of nearly 1400 interactions involving approximately 600 proteins. Intermolecular interactions mediated by proline-rich motifs (PRMs are observed in many facets of the immune response. The proline-rich regions are known to preferentially adopt a polyproline type II helical conformation, an extended structure that facilitates transient intermolecular interactions such as signal transduction, antigen recognition, cell-cell communication and cytoskeletal organization. The propensity of both the side chain and the backbone carbonyls of the polyproline type II helix to participate in the interface interaction makes it an excellent recognition motif. An advantage of such distinct chemical features is that the interactions can be discriminatory even in the absence of high affinities. Indeed, the immune response is mediated by well-orchestrated low-affinity short-duration intermolecular interactions. The proline-rich regions are predominantly localized in the solvent-exposed regions such as the loops, intrinsically disordered regions, or between domains that constitute the intermolecular interface. Peptide mimics of the PRM have been suggested as potential antagonists of intermolecular interactions. In this paper, we discuss novel PRM-mediated interactions in the human immunome that potentially serve as attractive targets for immunomodulation and drug development for inflammatory and autoimmune pathologies.

  14. Graph animals, subgraph sampling and motif search in large networks

    Baskerville, Kim; Paczuski, Maya

    2007-01-01

    We generalize a sampling algorithm for lattice animals (connected clusters on a regular lattice) to a Monte Carlo algorithm for `graph animals', i.e. connected subgraphs in arbitrary networks. As with the algorithm in [N. Kashtan et al., Bioinformatics 20, 1746 (2004)], it provides a weighted sample, but the computation of the weights is much faster (linear in the size of subgraphs, instead of super-exponential). This allows subgraphs with up to ten or more nodes to be sampled with very high statistics, from arbitrarily large networks. Using this together with a heuristic algorithm for rapidly classifying isomorphic graphs, we present results for two protein interaction networks obtained using the TAP high throughput method: one of Escherichia coli with 230 nodes and 695 links, and one for yeast (Saccharomyces cerevisiae) with roughly ten times more nodes and links. We find in both cases that most connected subgraphs are strong motifs (Z-scores >10) or anti-motifs (Z-scores <-10) when the null model is the...

  15. Prevalent RNA recognition motif duplication in the human genome.

    Tsai, Yihsuan S; Gomez, Shawn M; Wang, Zefeng

    2014-05-01

    The sequence-specific recognition of RNA by proteins is mediated through various RNA binding domains, with the RNA recognition motif (RRM) being the most frequent and present in >50% of RNA-binding proteins (RBPs). Many RBPs contain multiple RRMs, and it is unclear how each RRM contributes to the binding specificity of the entire protein. We found that RRMs within the same RBP (i.e., sibling RRMs) tend to have significantly higher similarity than expected by chance. Sibling RRM pairs from RBPs shared by multiple species tend to have lower similarity than those found only in a single species, suggesting that multiple RRMs within the same protein might arise from domain duplication followed by divergence through random mutations. This finding is exemplified by a recent RRM domain duplication in DAZ proteins and an ancient duplication in PABP proteins. Additionally, we found that different similarities between sibling RRMs are associated with distinct functions of an RBP and that the RBPs tend to contain repetitive sequences with low complexity. Taken together, this study suggests that the number of RBPs with multiple RRMs has expanded in mammals and that the multiple sibling RRMs may recognize similar target motifs in a cooperative manner. PMID:24667216

  16. A motif for reversible nitric oxide interactions in metalloenzymes.

    Zhang, Shiyu; Melzer, Marie M; Sen, S Nermin; Çelebi-Ölçüm, Nihan; Warren, Timothy H

    2016-07-01

    Nitric oxide (NO) participates in numerous biological processes, such as signalling in the respiratory system and vasodilation in the cardiovascular system. Many metal-mediated processes involve direct reaction of NO to form a metal-nitrosyl (M-NO), as occurs at the Fe(2+) centres of soluble guanylate cyclase or cytochrome c oxidase. However, some copper electron-transfer proteins that bear a type 1 Cu site (His2Cu-Cys) reversibly bind NO by an unknown motif. Here, we use model complexes of type 1 Cu sites based on tris(pyrazolyl)borate copper thiolates [Cu(II)]-SR to unravel the factors involved in NO reactivity. Addition of NO provides the fully characterized S-nitrosothiol adduct [Cu(I)](κ(1)-N(O)SR), which reversibly loses NO on purging with an inert gas. Computational analysis outlines a low-barrier pathway for the capture and release of NO. These findings suggest a new motif for reversible binding of NO at bioinorganic metal centres that can interconvert NO and RSNO molecular signals at copper sites. PMID:27325092

  17. Over-represented localized sequence motifs in ribosomal protein gene promoters of basal metazoans.

    Perina, Drago; Korolija, Marina; Roller, Maša; Harcet, Matija; Jeličić, Branka; Mikoč, Andreja; Cetković, Helena

    2011-07-01

    Equimolecular presence of ribosomal proteins (RPs) in the cell is needed for ribosome assembly and is achieved by synchronized expression of ribosomal protein genes (RPGs) with promoters of similar strengths. Over-represented motifs of RPG promoter regions are identified as targets for specific transcription factors. Unlike RPs, those motifs are not conserved between mammals, drosophila, and yeast. We analyzed RPGs proximal promoter regions of three basal metazoans with sequenced genomes: sponge, cnidarian, and placozoan and found common features, such as 5'-terminal oligopyrimidine tracts and TATA-boxes. Furthermore, we identified over-represented motifs, some of which displayed the highest similarity to motifs abundant in human RPG promoters and not present in Drosophila or yeast. Our results indicate that humans over-represented motifs, as well as corresponding domains of transcription factors, were established very early in metazoan evolution. The fast evolving nature of RPGs regulatory network leads to formation of other, lineage specific, over-represented motifs. PMID:21457775

  18. SALAD database: a motif-based database of protein annotations for plant comparative genomics

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2009-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets o...

  19. Belief-propagation algorithm and the Ising model on networks with arbitrary distributions of motifs

    Yoon, S; Goltsev, A. V.; Dorogovtsev, S. N.; Mendes, J. F. F.

    2011-01-01

    We generalize the belief-propagation algorithm to sparse random networks with arbitrary distributions of motifs (triangles, loops, etc.). Each vertex in these networks belongs to a given set of motifs (generalization of the configuration model). These networks can be treated as sparse uncorrelated hypergraphs in which hyperedges represent motifs. Here a hypergraph is a generalization of a graph, where a hyperedge can connect any number of vertices. These uncorrelated hypergraphs are tree-like...

  20. Vampirism today : the change of the vampire motif from the gothic novel to today's fantasy literature

    2009-01-01

    This thesis examins the change of the vampire motif throughout time. How have vampires and their clichés changed and why? Starting with a brief examination of the 'classical' litarary vampire, I mainly focus on contemporary fantasy literature by discussing recent works of vampire fiction. The adaptation of the vampire motif in role-playing games will as well be discussed as the effects the vampire film had on the motif.

  1. SIOMICS: a novel approach for systematic identification of motifs in ChIP-seq data

    DING, JUN; Hu, Haiyan; Li, Xiaoman

    2013-01-01

    The identification of transcription factor binding motifs is important for the study of gene transcriptional regulation. The chromatin immunoprecipitation (ChIP), followed by massive parallel sequencing (ChIP-seq) experiments, provides an unprecedented opportunity to discover binding motifs. Computational methods have been developed to identify motifs from ChIP-seq data, while at the same time encountering several problems. For example, existing methods are often not scalable to the large num...

  2. Phosphopeptide interactions with BRCA1 BRCT domains: More than just a motif

    Wu, Qian; Jubb, Harry; Blundell, Tom L.

    2015-01-01

    BRCA1 BRCT domains function as phosphoprotein-binding modules for recognition of the phosphory-lated protein-sequence motif pSXXF. While the motif interaction interface provides strong anchor points for binding, protein regions outside the motif have recently been found to be important for binding affinity. In this review, we compare the available structural data for BRCA1 BRCT domains in complex with phosphopeptides in order to gain a more complete understanding of the interaction betw...

  3. An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance

    Vinga Susana; Casimiro Ana C; Freitas Ana T; Oliveira Arlindo L

    2008-01-01

    Abstract Background Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but als...

  4. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Martin Juliette

    2011-06-01

    Full Text Available Abstract Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet, which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i ubiquitous motifs, shared by several superfamilies and (ii superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  5. Combinatorial analysis for sequence and spatial motif discovery in short sequence fragments

    Jackups, Ronald; Liang, Jie

    2010-01-01

    Motifs are over-represented sequence or spatial patterns appearing in proteins. They often play important roles in maintaining protein stability and in facilitating protein function. When motifs are located in short sequence fragments, as in transmembrane domains that are only 6–20 residues in length, and when there is only very limited data, it is difficult to identify motifs. In this study, we introduce combinatorial models based on permutation for assessing statistically significant sequen...

  6. Combinatorial analysis for sequence and spatial motif discovery in short sequence fragments

    Jackups, Ronald; Liang, Jie

    2006-01-01

    Motifs are over-represented sequence or spatial patterns appearing in proteins. They often play important roles in maintaining protein stability and in facilitating protein function. When motifs are located in short sequence fragments, as in transmembrane domains that are only 6–20 residues in length, and when there is only very limited data, it is difficult to identify motifs. In this study, we introduce combinatorial models based on permutation for assessing statistically significant sequen...

  7. Transition from winnerless competition to synchronization in time-delayed neuronal motifs

    Zhang, X.; Li, P. J.; Wu, F. P.; Wu, W. J.; Jiang, M.; Chen, L.; Qi, G. X.; Huang, H. B.

    2012-03-01

    The dynamics of brain functional motifs are studied. It is shown that different rhythms can occur in the motifs when time delay is taken into account. These rhythms include synchronization, winnerless competition (WLC) and "two plus one" (TPO). The main discovery is that the transition from WLC to synchronization can be induced simply by time delay. It is also concluded that some medium time delay is needed to achieve WLC in the realistic case. The motifs composed of heterogeneous neurons are also considered.

  8. A structural-alphabet-based strategy for finding structural motifs across protein families.

    Wu, Chih Yuan; Chen, Yao Chi; Lim, Carmay

    2010-08-01

    Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a 'corner' architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present 'only' in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign. PMID:20525797

  9. μXRF analysis of decoration motifs on Majolica pottery

    μXRF analysis of decoration motifs on Majolica pottery in fragments corresponding to several Majolica types was carried out using an spectrometer comprising a low power Mo X-ray tube and a elliptic-shape concentration lens with a 60 um spot. Both surface scanning and spot measurements were carried a out, allowing the qualitative identification of the inorganic pigments used for the surface painting decoration and the quantitative analysis of the main glaze composition. The absence of interference signal arising from the excitation on the underlying paste when analysing thin-lead glazing was evaluated, allowing ensuring the suitable of the analytical procedures. A distinction was found between different types of majolica by the composition of the lead tin glaze enamel and by the presence of other elements in the blue, black and orange decoration

  10. RNA Sociology: Group Behavioral Motifs of RNA Consortia

    Guenther Witzany

    2014-11-01

    Full Text Available RNA sociology investigates the behavioral motifs of RNA consortia from the social science perspective. Besides the self-folding of RNAs into single stem loop structures, group building of such stem loops results in a variety of essential agents that are highly active in regulatory processes in cellular and non-cellular life. RNA stem loop self-folding and group building do not depend solely on sequence syntax; more important are their contextual (functional needs. Also, evolutionary processes seem to occur through RNA stem loop consortia that may act as a complement. This means the whole entity functions only if all participating parts are coordinated, although the complementary building parts originally evolved for different functions. If complementary groups, such as rRNAs and tRNAs, are placed together in selective pressure contexts, new evolutionary features may emerge. Evolution initiated by competent agents in natural genome editing clearly contrasts with statistical error replication narratives.

  11. Study on online community user motif using web usage mining

    Alphy, Meera; Sharma, Ajay

    2016-04-01

    The Web usage mining is the application of data mining, which is used to extract useful information from the online community. The World Wide Web contains at least 4.73 billion pages according to Indexed Web and it contains at least 228.52 million pages according Dutch Indexed web on 6th august 2015, Thursday. It’s difficult to get needed data from these billions of web pages in World Wide Web. Here is the importance of web usage mining. Personalizing the search engine helps the web user to identify the most used data in an easy way. It reduces the time consumption; automatic site search and automatic restore the useful sites. This study represents the old techniques to latest techniques used in pattern discovery and analysis in web usage mining from 1996 to 2015. Analyzing user motif helps in the improvement of business, e-commerce, personalisation and improvement of websites.

  12. Viroid Intercellular Trafficking: RNA Motifs, Cellular Factors and Broad Impacts

    Ryuta Takeda

    2009-09-01

    Full Text Available Viroids are noncoding RNAs that infect plants. In order to establish systemic infection, these RNAs must traffic from an initially infected host cell into neighboring cells and ultimately throughout a whole plant. Recent studies have identified structural motifs in a viroid that are required for trafficking, enabling further studies on the mechanisms of their function. Some cellular proteins interact with viroids in vivo and may play a role in viroid trafficking, which can now be directly tested by using a virus-induced gene silencing system that functions efficiently in plant species from which these factors were identified. This review discusses these recent advances, unanswered questions and the use of viroid infection as an highly productive model to elucidate mechanisms of RNA trafficking that is of broad biological significance.

  13. Sequential dynamics in the motif of excitatory coupled elements

    Korotkov, Alexander G.; Kazakov, Alexey O.; Osipov, Grigory V.

    2015-11-01

    In this article a new model of motif (small ensemble) of neuron-like elements is proposed. It is built with the use of the generalized Lotka-Volterra model with excitatory couplings. The main motivation for this work comes from the problems of neuroscience where excitatory couplings are proved to be the predominant type of interaction between neurons of the brain. In this paper it is shown that there are two modes depending on the type of coupling between the elements: the mode with a stable heteroclinic cycle and the mode with a stable limit cycle. Our second goal is to examine the chaotic dynamics of the generalized three-dimensional Lotka-Volterra model.

  14. Structural fragment clustering reveals novel structural and functional motifs in α-helical transmembrane proteins

    Vassilev Boris

    2010-04-01

    Full Text Available Abstract Background A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology. Results We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to α-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94% appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1 a dimer interface motif found in voltage-gated chloride channels, (2 a proton transfer motif found in heme-copper oxidases, and (3 a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b. Conclusions Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

  15. Dynamic consequences of mutating the typical HPGG motif of apocytochrome b5 revealed by computer simulation

    Ying Wu Lin; Tian Lei Ying; Li Fu Liao

    2009-01-01

    Apecytochrome b5 with a typical heme-binding motif of HPGC,and its variants with mutated motifs,GPGG,GPGH,HVGG,and HPGP,have been subjected to molecular dynamics simulation.Comparison of the dynamic consequences has revealed the crucial role of HPGG in assembling the heme group of cytochrome b5 and in modulating protein structure,property and function.

  16. MOMFER: A Search Engine of Thompson's Motif-Index of Folk Literature

    Karsdorp, F.; Meulen, M. van der; Meder, Th.; Bosch, A.P.J. van den

    2015-01-01

    More than fifty years after the first edition of Thompson's seminal Motif-Indexof Folk Literature, we present an online search engine tailored to fully disclose the index digitally. This search engine, called MOMFER, greatly enhances the searchability of the Motif-Index and provides exciting new way

  17. Construction of a Three-Dimensional Motif Dictionary for Protein Structural Data Mining

    Hiroaki, Kato; Tadokoro, Tetsuo; Miyata, Hiroyuki; Chikamatsu, Shin-Ichi; Takahashi, Yoshimasa; Abe, Hidetsugu

    With the rapidly increasing number of proteins of which three-dimensional (3D) structures are known, the protein structure database is one of the key elements in many attempts being made to derive the knowledge of structure-function relationships of proteins. In this work, the authors have developed a software tool to assist in constructing the 3D protein motif dictionary that is closely related to the PROSITE sequence motif database. In the PROSITE, a structural feature called motif is described by a sequence pattern of amino acid residues with the regular expression defined in the database. The present system allows us to automatically find the related sites for all the 3D protein structures taken from a protein structure database such as the Protein Data Bank (PDB), and to make a dictionary of the 3D motifs related to the PROSITE sequence motif patterns. A computational trial was carried out for a subset of the PDB's structure data file. The structural feature analysis resulted with the tool showed that there are many different 3D motif patterns but having a particular PROSITE sequence pattern. For this reason, the authors also tried to classify the 3D motif patterns into several groups on the basis of distance similarity matrix, and to determine a representative pattern for each group in preparing the dictionary. The usefulness of the additional approach for preparing the 3D motif dictionary is also discussed with an illustrative example.

  18. Stabilization of i-motif structures by 2′-β-fluorination of DNA

    Assi, Hala Abou; Harkness, Robert W.; Martin-Pintado, Nerea; Wilds, Christopher J.; Campos-Olivas, Ramón; Mittermaier, Anthony K.; González, Carlos; Damha, Masad J.

    2016-01-01

    i-Motifs are four-stranded DNA structures consisting of two parallel DNA duplexes held together by hemi-protonated and intercalated cytosine base pairs (C:CH+). They have attracted considerable research interest for their potential role in gene regulation and their use as pH responsive switches and building blocks in macromolecular assemblies. At neutral and basic pH values, the cytosine bases deprotonate and the structure unfolds into single strands. To avoid this limitation and expand the range of environmental conditions supporting i-motif folding, we replaced the sugar in DNA by 2-deoxy-2-fluoroarabinose. We demonstrate that such a modification significantly stabilizes i-motif formation over a wide pH range, including pH 7. Nuclear magnetic resonance experiments reveal that 2-deoxy-2-fluoroarabinose adopts a C2′-endo conformation, instead of the C3′-endo conformation usually found in unmodified i-motifs. Nevertheless, this substitution does not alter the overall i-motif structure. This conformational change, together with the changes in charge distribution in the sugar caused by the electronegative fluorine atoms, leads to a number of favorable sequential and inter-strand electrostatic interactions. The availability of folded i-motifs at neutral pH will aid investigations into the biological function of i-motifs in vitro, and will expand i-motif applications in nanotechnology. PMID:27166371

  19. Genome adaptations of a tripartite motif protein for retroviral defense in cattle and sheep

    Tripartite motif (TRIM) genes encode proteins composed of RING, B-box, and coiled coil motif domains. Primate TRIM5' has been shown to be a primary determinant of retroviral host cell range restriction in primates. TRIM5 restriction was originally thought to be a primate-specific defense mechanism...

  20. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    Down Thomas A

    2010-09-01

    Full Text Available Abstract Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS" but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq not to be biological transcription factor binding sites ("empirical TFBS". We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation.

  1. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Fauteux François

    2009-10-01

    Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination

  2. RSAT::Plants: Motif Discovery Within Clusters of Upstream Sequences in Plant Genomes.

    Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Rioualen, Claire; Cantalapiedra, Carlos P; van Helden, Jacques

    2016-01-01

    The plant-dedicated mirror of the Regulatory Sequence Analysis Tools (RSAT, http://plants.rsat.eu ) offers specialized options for researchers dealing with plant transcriptional regulation. The website contains whole-sequenced genomes from species regularly updated from Ensembl Plants and other sources (currently 40), and supports an array of tasks frequently required for the analysis of regulatory sequences, such as retrieving upstream sequences, motif discovery, motif comparison, and pattern matching. RSAT::Plants also integrates the footprintDB collection of DNA motifs. This protocol explains step-by-step how to discover DNA motifs in regulatory regions of clusters of co-expressed genes in plants. It also explains how to empirically control the significance of the result, and how to associate the discovered motifs with putative binding factors. PMID:27557774

  3. A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation

    Bucher, P. [Swiss Institute for Experimental Cancer Research, Lausanne (Switzerland); Bairoch, A. [Centre Medical Universitaire, Geneva (Switzerland)

    1994-12-31

    A general syntax for expressing bimolecular sequence motifs is described, which will be used in future releases of the PROSITE data bank and in a similar collection of nucleic acid sequence motifs currently under development. The central part of the syntax is a regular structure which can be viewed as a generalization of the profiles introduced by Gribskov and coworkers. Accessory features implement specific motif search strategies and provide information helpful for the interpretation of predicted matches. Two contrasting examples, representing E. coli promoters and SH3 domains respectively, are shown to demonstrate the versatility of the syntax, and its compatibility with diverse motif search methods. It is argued, that a comprehensive machine-readable motif collection based on the new syntax, in conjunction with a standard search program, can serve as a general-purpose sequence interpretation and function prediction tool.

  4. MOTIFSIM: A web tool for detecting similarity in multiple DNA motif datasets.

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2015-07-01

    Currently, there are a number of motif detection tools available that possess unique functionality. These tools often report different motifs, and therefore use of multiple tools is generally advised since common motifs reported by multiple tools are more likely to be biologically significant. However, results produced by these different tools need to be compared and existing similarity detection tools only allow comparison between two data sets. Here, we describe a motif similarity detection tool (MOTIFSIM) possessing a web-based, user-friendly interface that is capable of detecting similarity from multiple DNA motif data sets concurrently. Results can either be viewed online or downloaded. Users may also download and run MOTIFSIM as a command-line tool in stand-alone mode. The web tool, along with its command-line version, user manuals, and source codes, are freely available at http://biogrid-head.engr.uconn.edu/motifsim/. PMID:26156781

  5. Bioinformatics Study of Cancer-Related Mutations within p53 Phosphorylation Site Motifs

    Xiaona Ji

    2014-07-01

    Full Text Available p53 protein has about thirty phosphorylation sites located at the N- and C-termini and in the core domain. The phosphorylation sites are relatively less mutated than other residues in p53. To understand why and how p53 phosphorylation sites are rarely mutated in human cancer, using a bioinformatics approaches, we examined the phosphorylation site and its nearby flanking residues, focusing on the consensus phosphorylation motif pattern, amino-acid correlations within the phosphorylation motifs, the propensity of structural disorder of the phosphorylation motifs, and cancer mutations observed within the phosphorylation motifs. Many p53 phosphorylation sites are targets for several kinases. The phosphorylation sites match 17 consensus sequence motifs out of the 29 classified. In addition to proline, which is common in kinase specificity-determining sites, we found high propensity of acidic residues to be adjacent to phosphorylation sites. Analysis of human cancer mutations in the phosphorylation motifs revealed that motifs with adjacent acidic residues generally have fewer mutations, in contrast to phosphorylation sites near proline residues. p53 phosphorylation motifs are mostly disordered. However, human cancer mutations within phosphorylation motifs tend to decrease the disorder propensity. Our results suggest that combination of acidic residues Asp and Glu with phosphorylation sites provide charge redundancy which may safe guard against loss-of-function mutations, and that the natively disordered nature of p53 phosphorylation motifs may help reduce mutational damage. Our results further suggest that engineering acidic amino acids adjacent to potential phosphorylation sites could be a p53 gene therapy strategy.

  6. Distance-based identification of structure motifs in proteins using constrained frequent subgraph mining.

    Huan, Jun; Bandyopadhyay, Deepak; Prins, Jan; Snoeyink, Jack; Tropsha, Alexander; Wang, Wei

    2006-01-01

    Structure motifs are amino acid packing patterns that occur frequently within a set of protein structures. We define a labeled graph representation of protein structure in which vertices correspond to amino acid residues and edges connect pairs of residues and are labeled by (1) the Euclidian distance between the C(alpha) atoms of the two residues and (2) a boolean indicating whether the two residues are in physical/chemical contact. Using this representation, a structure motif corresponds to a labeled clique that occurs frequently among the graphs representing the protein structures. The pairwise distance constraints on each edge in a clique serve to limit the variation in geometry among different occurrences of a structure motif. We present an efficient constrained subgraph mining algorithm to discover structure motifs in this setting. Compared with contact graph representations, the number of spurious structure motifs is greatly reduced. Using this algorithm, structure motifs were located for several SCOP families including the Eukaryotic Serine Proteases, Nuclear Binding Domains, Papain-like Cysteine Proteases, and FAD/NAD-linked Reductases. For each family, we typically obtain a handful of motifs within seconds of processing time. The occurrences of these motifs throughout the PDB were strongly associated with the original SCOP family, as measured using a hyper-geometric distribution. The motifs were found to cover functionally important sites like the catalytic triad for Serine Proteases and co-factor binding sites for Nuclear Binding Domains. The fact that many motifs are highly family-specific can be used to classify new proteins or to provide functional annotation in Structural Genomics Projects. PMID:17369641

  7. Sulfur-induced structural motifs on copper and gold surfaces

    Walen, Holly

    The interaction of sulfur with copper and gold surfaces plays a fundamental role in important phenomena that include coarsening of surface nanostructures, and self-assembly of alkanethiols. Here, we identify and analyze unique sulfur-induced structural motifs observed on the low-index surfaces of these two metals. We seek out these structures in an effort to better understand the fundamental interactions between these metals and sulfur that lends to the stability and favorability of metal-sulfur complexes vs. chemisorbed atomic sulfur. We choose very specific conditions: very low temperature (5 K), and very low sulfur coverage (≤ 0.1 monolayer). In this region of temperature-coverage space, which has not been examined previously for these adsorbate-metal systems, the effects of individual interactions between metals and sulfur are most apparent and can be assessed extensively with the aid of theory and modeling. Furthermore, at this temperature diffusion is minimal and relatively-mobile species can be isolated, and at low coverage the structures observed are not consumed by an extended reconstruction. The primary experimental technique is scanning tunneling microscopy (STM). The experimental observations presented here---made under identical conditions---together with extensive DFT analyses, allow comparisons and insights into factors that favor the existence of metal-sulfur complexes, vs. chemisorbed atomic sulfur, on metal terraces. We believe this data will be instrumental in better understanding the complex phenomena occurring between the surfaces of coinage metals and sulfur.

  8. Motif mediated protein-protein interactions as drug targets.

    Corbi-Verge, Carles; Kim, Philip M

    2016-01-01

    Protein-protein interactions (PPI) are involved in virtually every cellular process and thus represent an attractive target for therapeutic interventions. A significant number of protein interactions are frequently formed between globular domains and short linear peptide motifs (DMI). Targeting these DMIs has proven challenging and classical approaches to inhibiting such interactions with small molecules have had limited success. However, recent new approaches have led to the discovery of potent inhibitors, some of them, such as Obatoclax, ABT-199, AEG-40826 and SAH-p53-8 are likely to become approved drugs. These novel inhibitors belong to a wide range of different molecule classes, ranging from small molecules to peptidomimetics and biologicals. This article reviews the main reasons for limited success in targeting PPIs, discusses how successful approaches overcome these obstacles to discovery promising inhibitors for human protein double minute 2 (HDM2), B-cell lymphoma 2 (Bcl-2), X-linked inhibitor of apoptosis protein (XIAP), and provides a summary of the promising approaches currently in development that indicate the future potential of PPI inhibitors in drug discovery. PMID:26936767

  9. The bridge: suggestions about the meaning of a pictorial motif

    Omar Calabrese

    2011-12-01

    Full Text Available Developing research begun at the Warburg Institute in 1983, this paper reflects on the construction of meaning in a work of art, through the analysis of the bridge’s function in painting. It tries to reply to some objections the author received there from Gombrich, about the chance of finding a stable content in the configuration of the bridge. Hence, the study reconsiders the concept of ‘motif’ applied to this structure. In a semiotic perspective a motif is partially independent as regards to a single textual organization, because it has a mobile and migrant feature. However, it is also partially flexible as it depends upon the same organization. The inquiry shows that bridge’s internal structure corresponds to the category of a ‘junction’, with two opposite items, ‘conjunction’ and ‘disjunction’. The development of this theoretical object can be carried out also by figures that are not ‘bridges’, in the natural sense of the word. Furthermore, its meaning does not depend upon the number of examples we can find but only upon their relevance for constructing a ‘grammar of cases’. Differently from the traditional iconographical approach, but also from panofskian iconology, the analysis moves not only towards the simple or complex content of a figure but also towards its description.

  10. Metagenome fragment classification based on multiple motif-occurrence profiles

    Naoki Matsushita

    2014-09-01

    Full Text Available A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.

  11. Network motifs that stabilize the hybrid epithelial/mesenchymal phenotype

    Jolly, Mohit Kumar; Jia, Dongya; Tripathi, Satyendra; Hanash, Samir; Mani, Sendurai; Ben-Jacob, Eshel; Levine, Herbert

    Epithelial to Mesenchymal Transition (EMT) and its reverse - MET - are hallmarks of cancer metastasis. While transitioning between E and M phenotypes, cells can also attain a hybrid epithelial/mesenchymal (E/M) phenotype that enables collective cell migration as a cluster of Circulating Tumor Cells (CTCs). These clusters can form 50-times more tumors than individually migrating CTCs, underlining their importance in metastasis. However, this hybrid E/M phenotype has been hypothesized to be only a transient one that is attained en route EMT. Here, via mathematically modeling, we identify certain `phenotypic stability factors' that couple with the core three-way decision-making circuit (miR-200/ZEB) and can maintain or stabilize the hybrid E/M phenotype. Further, we show experimentally that this phenotype can be maintained stably at a single-cell level, and knockdown of these factors impairs collective cell migration. We also show that these factors enable the association of hybrid E/M with high stemness or tumor-initiating potential. Finally, based on these factors, we deduce specific network motifs that can maintain the E/M phenotype. Our framework can be used to elucidate the effect of other players in regulating cellular plasticity during metastasis. This work was supported by NSF PHY-1427654 (Center for Theoretical Biological Physics) and the CPRIT Scholar in Cancer Research of the State of Texas at Rice University.

  12. Tyrosine motifs are required for prestin basolateral membrane targeting

    Yifan Zhang

    2015-01-01

    Full Text Available Prestin is targeted to the lateral wall of outer hair cells (OHCs where its electromotility is critical for cochlear amplification. Using MDCK cells as a model system for polarized epithelial sorting, we demonstrate that prestin uses tyrosine residues, in a YXXΦ motif, to target the basolateral surface. Both Y520 and Y667 are important for basolateral targeting of prestin. Mutation of these residues to glutamine or alanine resulted in retention within the Golgi and delayed egress from the Golgi in Y667Q. Basolateral targeting is restored upon mutation to phenylalanine suggesting the importance of a phenol ring in the tyrosine side chain. We also demonstrate that prestin targeting to the basolateral surface is dependent on AP1B (μ1B, and that prestin uses transferrin containing early endosomes in its passage from the Golgi to the basolateral plasma membrane. The presence of AP1B (μ1B in OHCs, and parallels between prestin targeting to the basolateral surface of OHCs and polarized epithelial cells suggest that outer hair cells resemble polarized epithelia rather than neurons in this important phenotypic measure.

  13. FR3D: finding local and composite recurrent structural motifs in RNA 3D structures.

    Sarver, Michael; Zirbel, Craig L; Stombaugh, Jesse; Mokdad, Ali; Leontis, Neocles B

    2008-01-01

    New methods are described for finding recurrent three-dimensional (3D) motifs in RNA atomic-resolution structures. Recurrent RNA 3D motifs are sets of RNA nucleotides with similar spatial arrangements. They can be local or composite. Local motifs comprise nucleotides that occur in the same hairpin or internal loop. Composite motifs comprise nucleotides belonging to three or more different RNA strand segments or molecules. We use a base-centered approach to construct efficient, yet exhaustive search procedures using geometric, symbolic, or mixed representations of RNA structure that we implement in a suite of MATLAB programs, "Find RNA 3D" (FR3D). The first modules of FR3D preprocess structure files to classify base-pair and -stacking interactions. Each base is represented geometrically by the position of its glycosidic nitrogen in 3D space and by the rotation matrix that describes its orientation with respect to a common frame. Base-pairing and base-stacking interactions are calculated from the base geometries and are represented symbolically according to the Leontis/Westhof basepairing classification, extended to include base-stacking. These data are stored and used to organize motif searches. For geometric searches, the user supplies the 3D structure of a query motif which FR3D uses to find and score geometrically similar candidate motifs, without regard to the sequential position of their nucleotides in the RNA chain or the identity of their bases. To score and rank candidate motifs, FR3D calculates a geometric discrepancy by rigidly rotating candidates to align optimally with the query motif and then comparing the relative orientations of the corresponding bases in the query and candidate motifs. Given the growing size of the RNA structure database, it is impossible to explicitly compute the discrepancy for all conceivable candidate motifs, even for motifs with less than ten nucleotides. The screening algorithm that we describe finds all candidate motifs whose

  14. Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

    Jin, R; McCallen, S; Almaas, E

    2007-05-28

    Complex networks have been used successfully in scientific disciplines ranging from sociology to microbiology to describe systems of interacting units. Until recently, studies of complex networks have mainly focused on their network topology. However, in many real world applications, the edges and vertices have associated attributes that are frequently represented as vertex or edge weights. Furthermore, these weights are often not static, instead changing with time and forming a time series. Hence, to fully understand the dynamics of the complex network, we have to consider both network topology and related time series data. In this work, we propose a motif mining approach to identify trend motifs for such purposes. Simply stated, a trend motif describes a recurring subgraph where each of its vertices or edges displays similar dynamics over a userdefined period. Given this, each trend motif occurrence can help reveal significant events in a complex system; frequent trend motifs may aid in uncovering dynamic rules of change for the system, and the distribution of trend motifs may characterize the global dynamics of the system. Here, we have developed efficient mining algorithms to extract trend motifs. Our experimental validation using three disparate empirical datasets, ranging from the stock market, world trade, to a protein interaction network, has demonstrated the efficiency and effectiveness of our approach.

  15. Recurrent motifs as resonant attractor states in the narrative field: a testable model of archetype.

    Goodwyn, Erik

    2013-06-01

    At the most basic level, archetypes represented Jung's attempt to explain the phenomenon of recurrent myths and folktale motifs (Jung 1956, 1959, para. 99). But the archetype remains controversial as an explanation of recurrent motifs, as the existence of recurrent motifs does not prove that archetypes exist. Thus, the challenge for contemporary archetype theory is not merely to demonstrate that recurrent motifs exist, since that is not disputed, but to demonstrate that archetypes exist and cause recurrent motifs. The present paper proposes a new model which is unlike others in that it postulates how the archetype creates resonant motifs. This model necessarily clarifies and adapts some of Jung's seminal ideas on archetype in order to provide a working framework grounded in contemporary practice and methodologies. For the first time, a model of archetype is proposed that can be validated on empirical, rather than theoretical grounds. This is achieved by linking the archetype to the hard data of recurrent motifs rather than academic trends in other fields. PMID:23750942

  16. MODA: an efficient algorithm for network motif discovery in biological networks.

    Omidi, Saeed; Schreiber, Falk; Masoudi-Nejad, Ali

    2009-10-01

    In recent years, interest has been growing in the study of complex networks. Since Erdös and Rényi (1960) proposed their random graph model about 50 years ago, many researchers have investigated and shaped this field. Many indicators have been proposed to assess the global features of networks. Recently, an active research area has developed in studying local features named motifs as the building blocks of networks. Unfortunately, network motif discovery is a computationally hard problem and finding rather large motifs (larger than 8 nodes) by means of current algorithms is impractical as it demands too much computational effort. In this paper, we present a new algorithm (MODA) that incorporates techniques such as a pattern growth approach for extracting larger motifs efficiently. We have tested our algorithm and found it able to identify larger motifs with more than 8 nodes more efficiently than most of the current state-of-the-art motif discovery algorithms. While most of the algorithms rely on induced subgraphs as motifs of the networks, MODA is able to extract both induced and non-induced subgraphs simultaneously. The MODA source code is freely available at: http://LBB.ut.ac.ir/Download/LBBsoft/MODA/ PMID:20154426

  17. A Further Study on Mining DNA Motifs Using Fuzzy Self-Organizing Maps.

    Tapan, Sarwar; Wang, Dianhui

    2016-01-01

    Self-organizing map (SOM)-based motif mining, despite being a promising approach for problem solving, mostly fails to offer a consistent interpretation of clusters with respect to the mixed composition of signal and noise in the nodes. The main reason behind this shortcoming comes from the similarity metrics used in data assignment, specially designed with the biological interpretation for this domain, which are not meant to consider the inevitable noise mixture in the clusters. This limits the explicability of the majority of clusters that are supposedly noise dominated, degrading the overall system clarity in motif discovery. This paper aims to improve the explicability aspect of learning process by introducing a composite similarity function (CSF) that is specially designed for the k -mer-to-cluster similarity measure with respect to the degree of motif properties and embedded noise in the cluster. Our proposed motif finding algorithm in this paper is built on our previous work robust elicitation algorithms for discovering (READ) [1] and termed READ Deoxyribonucleic acid motifs using CSFs (READ(csf)), which performs slightly better than READ and shows some remarkable improvements over SOM-based SOMBRERO and SOMEA tools in terms of F-measure on the testing data sets. A real data set containing multiple motifs is used to explore the potential of the READ(csf) for more challenging biological data mining tasks. Visual comparisons with the verified logos extracted from JASPAR database demonstrate that our algorithm is promising to discover multiple motifs simultaneously. PMID:26068877

  18. Comparative Analysis of Regulatory Motif Discovery Tools for Transcription Factor Binding Sites

    2007-01-01

    In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, has emerged as an obstacle that frustrates many researchers. Consequently, numerous motif discovery tools and correlated databases have been applied to solving this problem. However, these existing methods, based on different computational algorithms, show diverse motif prediction efficiency in non-coding DNA sequences. Therefore, understanding the similarities and differences of computational algorithms and enriching the motif discovery literatures are important for users to choose the most appropriate one among the online available tools. Moreover, there still lacks credible criterion to assess motif discovery tools and instructions for researchers to choose the best according to their own projects. Thus integration of the related resources might be a good approach to improve accuracy of the application. Recent studies integrate regulatory motif discovery tools with experimental methods to offer a complementary approach for researchers, and also provide a much-needed model for current researches on transcriptional regulatory networks. Here we present a comparative analysis of regulatory motif discovery tools for TFBSs.

  19. Dispom: a discriminative de-novo motif discovery tool based on the jstacs library.

    Grau, Jan; Keilwagen, Jens; Gohr, André; Paponov, Ivan A; Posch, Stefan; Seifert, Michael; Strickert, Marc; Grosse, Ivo

    2013-02-01

    DNA-binding proteins are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in target regions of genomic DNA. However, de-novo discovery of these binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not yet been solved satisfactorily. Here, we present a detailed description and analysis of the de-novo motif discovery tool Dispom, which has been developed for finding binding sites of DNA-binding proteins that are differentially abundant in a set of target regions compared to a set of control regions. Two additional features of Dispom are its capability of modeling positional preferences of binding sites and adjusting the length of the motif in the learning process. Dispom yields an increased prediction accuracy compared to existing tools for de-novo motif discovery, suggesting that the combination of searching for differentially abundant motifs, inferring their positional distributions, and adjusting the motif lengths is beneficial for de-novo motif discovery. When applying Dispom to promoters of auxin-responsive genes and those of ABI3 target genes from Arabidopsis thaliana, we identify relevant binding motifs with pronounced positional distributions. These results suggest that learning motifs, their positional distributions, and their lengths by a discriminative learning principle may aid motif discovery from ChIP-chip and gene expression data. We make Dispom freely available as part of Jstacs, an open-source Java library that is tailored to statistical sequence analysis. To facilitate extensions of Dispom, we describe its implementation using Jstacs in this manuscript. In addition, we provide a stand-alone application of Dispom at http://www.jstacs.de/index.php/Dispom for instant use. PMID:23427988

  20. Identification of a putative nuclear export signal motif in human NANOG homeobox domain

    Highlights: ► We found the putative nuclear export signal motif within human NANOG homeodomain. ► Leucine-rich residues are important for human NANOG homeodomain nuclear export. ► CRM1-specific inhibitor LMB blocked the potent human NANOG NES-mediated nuclear export. -- Abstract: NANOG is a homeobox-containing transcription factor that plays an important role in pluripotent stem cells and tumorigenic cells. To understand how nuclear localization of human NANOG is regulated, the NANOG sequence was examined and a leucine-rich nuclear export signal (NES) motif (125MQELSNILNL134) was found in the homeodomain (HD). To functionally validate the putative NES motif, deletion and site-directed mutants were fused to an EGFP expression vector and transfected into COS-7 cells, and the localization of the proteins was examined. While hNANOG HD exclusively localized to the nucleus, a mutant with both NLSs deleted and only the putative NES motif contained (hNANOG HD-ΔNLSs) was predominantly cytoplasmic, as observed by nucleo/cytoplasmic fractionation and Western blot analysis as well as confocal microscopy. Furthermore, site-directed mutagenesis of the putative NES motif in a partial hNANOG HD only containing either one of the two NLS motifs led to localization in the nucleus, suggesting that the NES motif may play a functional role in nuclear export. Furthermore, CRM1-specific nuclear export inhibitor LMB blocked the hNANOG potent NES-mediated export, suggesting that the leucine-rich motif may function in CRM1-mediated nuclear export of hNANOG. Collectively, a NES motif is present in the hNANOG HD and may be functionally involved in CRM1-mediated nuclear export pathway.

  1. Characterization of the tandem CWCH2 sequence motif: a hallmark of inter-zinc finger interactions

    Aruga Jun

    2010-02-01

    Full Text Available Abstract Background The C2H2 zinc finger (ZF domain is widely conserved among eukaryotic proteins. In Zic/Gli/Zap1 C2H2 ZF proteins, the two N-terminal ZFs form a single structural unit by sharing a hydrophobic core. This structural unit defines a new motif comprised of two tryptophan side chains at the center of the hydrophobic core. Because each tryptophan residue is located between the two cysteine residues of the C2H2 motif, we have named this structure the tandem CWCH2 (tCWCH2 motif. Results Here, we characterized 587 tCWCH2-containing genes using data derived from public databases. We categorized genes into 11 classes including Zic/Gli/Glis, Arid2/Rsc9, PacC, Mizf, Aebp2, Zap1/ZafA, Fungl, Zfp106, Twincl, Clr1, and Fungl-4ZF, based on sequence similarity, domain organization, and functional similarities. tCWCH2 motifs are mostly found in organisms belonging to the Opisthokonta (metazoa, fungi, and choanoflagellates and Amoebozoa (amoeba, Dictyostelium discoideum. By comparison, the C2H2 ZF motif is distributed widely among the eukaryotes. The structure and organization of the tCWCH2 motif, its phylogenetic distribution, and molecular phylogenetic analysis suggest that prototypical tCWCH2 genes existed in the Opisthokonta ancestor. Within-group or between-group comparisons of the tCWCH2 amino acid sequence identified three additional sequence features (site-specific amino acid frequencies, longer linker sequence between two C2H2 ZFs, and frequent extra-sequences within C2H2 ZF motifs. Conclusion These features suggest that the tCWCH2 motif is a specialized motif involved in inter-zinc finger interactions.

  2. Miz-1 activates gene expression via a novel consensus DNA binding motif.

    Bonnie L Barrilleaux

    Full Text Available The transcription factor Miz-1 can either activate or repress gene expression in concert with binding partners including the Myc oncoprotein. The genomic binding of Miz-1 includes both core promoters and more distal sites, but the preferred DNA binding motif of Miz-1 has been unclear. We used a high-throughput in vitro technique, Bind-n-Seq, to identify two Miz-1 consensus DNA binding motif sequences--ATCGGTAATC and ATCGAT (Mizm1 and Mizm2--bound by full-length Miz-1 and its zinc finger domain, respectively. We validated these sequences directly as high affinity Miz-1 binding motifs. Competition assays using mutant probes indicated that the binding affinity of Miz-1 for Mizm1 and Mizm2 is highly sequence-specific. Miz-1 strongly activates gene expression through the motifs in a Myc-independent manner. MEME-ChIP analysis of Miz-1 ChIP-seq data in two different cell types reveals a long motif with a central core sequence highly similar to the Mizm1 motif identified by Bind-n-Seq, validating the in vivo relevance of the findings. Miz-1 ChIP-seq peaks containing the long motif are predominantly located outside of proximal promoter regions, in contrast to peaks without the motif, which are highly concentrated within 1.5 kb of the nearest transcription start site. Overall, our results indicate that Miz-1 may be directed in vivo to the novel motif sequences we have identified, where it can recruit its specific binding partners to control gene expression and ultimately regulate cell fate.

  3. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs

    van Beest, M; Dooijes, D; van De Wetering, M; Kjaerulff, S; Bonvin, A; Nielsen, O; Clevers, H; Nielsen, Olaf

    2000-01-01

    promoter elements controlled by the yeast genes ste11 and Rox1 has indicated strict conservation of a larger DNA motif. By site selection, we identify a highly specific 12-base pair motif for Ste11, AGAACAAAGAAA. Similarly, we show that Tcf1, MatMc, and Sox4 bind unique, highly specific DNA motifs of 12...

  4. Capping motifs stabilize the leucine-rich repeat protein PP32 and rigidify adjacent repeats

    Dao, Thuy P; Majumdar, Ananya; Barrick, Doug

    2014-01-01

    Capping motifs are found to flank most β-strand-containing repeat proteins. To better understand the roles of these capping motifs in organizing structure and stability, we carried out folding and solution NMR studies on the leucine-rich repeat (LRR) domain of PP32, which is composed of five tandem LRR, capped by α-helical and β-hairpin motifs on the N- and C-termini. We were able to purify PP32 constructs lacking either cap and containing destabilizing substitutions. Removing the C-cap resul...

  5. Identifying Function, Agent, and Setting Motifs in Some Early Spanish "libros de caballerías"

    NEUMAYER, KRISTIN

    2012-01-01

    The essay presents the methodology of a doctoral thesis (2008, University of Wisconsin-Madison) which classifies plot motifs in some sixteenth-century Castilian books of chivalry. Therein, two critical approaches to the texts are noted: motif studies, which analyze narrative components, and structural studies, which examine whole plotlines. Based on V. Propp’s Morphology of the Folktale, the motif is defined as a unit of plot structure. Propp’s thirty-one functions and seven tale-roles are th...

  6. ELM 2016--data update and new functionality of the eukaryotic linear motif resource.

    Dinkel, Holger; Van Roey, Kim; Michael, Sushama; Kumar, Manjeet; Uyar, Bora; Altenberg, Brigitte; Milchevskaya, Vladislava; Schneider, Melanie; Kühn, Helen; Behrendt, Annika; Dahl, Sophie Luise; Damerell, Victoria; Diebel, Sandra; Kalman, Sara; Klein, Steffen; Knudsen, Arne C; Mäder, Christina; Merrill, Sabina; Staudt, Angelina; Thiel, Vera; Welti, Lukas; Davey, Norman E; Diella, Francesca; Gibson, Toby J

    2016-01-01

    The Eukaryotic Linear Motif (ELM) resource (http://elm.eu.org) is a manually curated database of short linear motifs (SLiMs). In this update, we present the latest additions to this resource, along with more improvements to the web interface. ELM 2016 contains more than 240 different motif classes with over 2700 experimentally validated instances, manually curated from more than 2400 scientific publications. In addition, more data have been made available as individually searchable pages and are downloadable in various formats. PMID:26615199

  7. FTZ-Factor1 and Fushi tarazu interact via conserved nuclear receptor and coactivator motifs

    Schwartz, Carol J.E.; Sampson, Heidi M.; Hlousek, Daniela; Percival-Smith, Anthony; Copeland, John W.R.; Simmonds, Andrew J.; Krause, Henry M.

    2001-01-01

    To activate transcription, most nuclear receptor proteins require coactivators that bind to their ligand-binding domains (LBDs). The Drosophila FTZ-Factor1 (FTZ-F1) protein is a conserved member of the nuclear receptor superfamily, but was previously thought to lack an AF2 motif, a motif that is required for ligand and coactivator binding. Here we show that FTZ-F1 does have an AF2 motif and that it is required to bind a coactivator, the homeodomain-containing protein Fushi tarazu (FTZ). We al...

  8. Discovering structural motifs using a structural alphabet: Application to magnesium-binding sites

    Lim Carmay

    2007-03-01

    Full Text Available Abstract Background For many metalloproteins, sequence motifs characteristic of metal-binding sites have not been found or are so short that they would not be expected to be metal-specific. Striking examples of such metalloproteins are those containing Mg2+, one of the most versatile metal cofactors in cellular biochemistry. Even when Mg2+-proteins share insufficient sequence homology to identify Mg2+-specific sequence motifs, they may still share similarity in the Mg2+-binding site structure. However, no structural motifs characteristic of Mg2+-binding sites have been reported. Thus, our aims are (i to develop a general method for discovering structural patterns/motifs characteristic of ligand-binding sites, given the 3D protein structures, and (ii to apply it to Mg2+-proteins sharing 2+-structural motifs are identified as recurring structural patterns. Results The structural alphabet-based motif discovery method has revealed the structural preference of Mg2+-binding sites for certain local/secondary structures: compared to all residues in the Mg2+-proteins, both first and second-shell Mg2+-ligands prefer loops to helices. Even when the Mg2+-proteins share no significant sequence homology, some of them share a similar Mg2+-binding site structure: 4 Mg2+-structural motifs, comprising 21% of the binding sites, were found. In particular, one of the Mg2+-structural motifs found maps to a specific functional group, namely, hydrolases. Furthermore, 2 of the motifs were not found in non metalloproteins or in Ca2+-binding proteins. The structural motifs discovered thus capture some essential biochemical and/or evolutionary properties, and hence may be useful for discovering proteins where Mg2+ plays an important biological role. Conclusion The structural motif discovery method presented herein is general and can be applied to any set of proteins with known 3D structures. This new method is timely considering the increasing number of structures for

  9. Stochastic Resonance in Neuronal Network Motifs with Ornstein-Uhlenbeck Colored Noise

    Xuyang Lou

    2014-01-01

    Full Text Available We consider here the effect of the Ornstein-Uhlenbeck colored noise on the stochastic resonance of the feed-forward-loop (FFL network motif. The FFL motif is modeled through the FitzHugh-Nagumo neuron model as well as the chemical coupling. Our results show that the noise intensity and the correlation time of the noise process serve as the control parameters, which have great impacts on the stochastic dynamics of the FFL motif. We find that, with a proper choice of noise intensities and the correlation time of the noise process, the signal-to-noise ratio (SNR can display more than one peak.

  10. Negative in vitro selection identifies the rRNA recognition motif for ErmE methyltransferase

    Nielsen, A K; Douthwaite, S; Vester, B

    1999-01-01

    the adjacent single-stranded region around A2058. An RNA transcript of 72 nt that displays this motif functions as an efficient substrate for the ErmE methyltransferase. Pools of degenerate RNAs were formed by doping 34-nt positions that extend over and beyond the putative Erm recognition motif within...... contained substitutions at single sites, and these are confined to 12 nucleotide positions. These nucleotides, corresponding to A2051-A2060, C2611, and A2614 in 23S rRNA, presumably comprise the RNA recognition motif for ErmE methyltransferase. The structure formed by these nucleotides is highly conserved...

  11. Elongated polyproline motifs facilitate enamel evolution through matrix subunit compaction.

    Tianquan Jin

    2009-12-01

    Full Text Available Vertebrate body designs rely on hydroxyapatite as the principal mineral component of relatively light-weight, articulated endoskeletons and sophisticated tooth-bearing jaws, facilitating rapid movement and efficient predation. Biological mineralization and skeletal growth are frequently accomplished through proteins containing polyproline repeat elements. Through their well-defined yet mobile and flexible structure polyproline-rich proteins control mineral shape and contribute many other biological functions including Alzheimer's amyloid aggregation and prolamine plant storage. In the present study we have hypothesized that polyproline repeat proteins exert their control over biological events such as mineral growth, plaque aggregation, or viscous adhesion by altering the length of their central repeat domain, resulting in dramatic changes in supramolecular assembly dimensions. In order to test our hypothesis, we have used the vertebrate mineralization protein amelogenin as an exemplar and determined the biological effect of the four-fold increased polyproline tandem repeat length in the amphibian/mammalian transition. To study the effect of polyproline repeat length on matrix assembly, protein structure, and apatite crystal growth, we have measured supramolecular assembly dimensions in various vertebrates using atomic force microscopy, tested the effect of protein assemblies on crystal growth by electron microscopy, generated a transgenic mouse model to examine the effect of an abbreviated polyproline sequence on crystal growth, and determined the structure of polyproline repeat elements using 3D NMR. Our study shows that an increase in PXX/PXQ tandem repeat motif length results (i in a compaction of protein matrix subunit dimensions, (ii reduced conformational variability, (iii an increase in polyproline II helices, and (iv promotion of apatite crystal length. Together, these findings establish a direct relationship between polyproline tandem

  12. The ADAMTS (A Disintegrin and Metalloproteinase with Thrombospondin motifs) family.

    Kelwick, Richard; Desanlis, Ines; Wheeler, Grant N; Edwards, Dylan R

    2015-01-01

    The ADAMTS (A Disintegrin and Metalloproteinase with Thrombospondin motifs) enzymes are secreted, multi-domain matrix-associated zinc metalloendopeptidases that have diverse roles in tissue morphogenesis and patho-physiological remodeling, in inflammation and in vascular biology. The human family includes 19 members that can be sub-grouped on the basis of their known substrates, namely the aggrecanases or proteoglycanases (ADAMTS1, 4, 5, 8, 9, 15 and 20), the procollagen N-propeptidases (ADAMTS2, 3 and 14), the cartilage oligomeric matrix protein-cleaving enzymes (ADAMTS7 and 12), the von-Willebrand Factor proteinase (ADAMTS13) and a group of orphan enzymes (ADAMTS6, 10, 16, 17, 18 and 19). Control of the structure and function of the extracellular matrix (ECM) is a central theme of the biology of the ADAMTS, as exemplified by the actions of the procollagen-N-propeptidases in collagen fibril assembly and of the aggrecanases in the cleavage or modification of ECM proteoglycans. Defects in certain family members give rise to inherited genetic disorders, while the aberrant expression or function of others is associated with arthritis, cancer and cardiovascular disease. In particular, ADAMTS4 and 5 have emerged as therapeutic targets in arthritis. Multiple ADAMTSs from different sub-groupings exert either positive or negative effects on tumorigenesis and metastasis, with both metalloproteinase-dependent and -independent actions known to occur. The basic ADAMTS structure comprises a metalloproteinase catalytic domain and a carboxy-terminal ancillary domain, the latter determining substrate specificity and the localization of the protease and its interaction partners; ancillary domains probably also have independent biological functions. Focusing primarily on the aggrecanases and proteoglycanases, this review provides a perspective on the evolution of the ADAMTS family, their links with developmental and disease mechanisms, and key questions for the future. PMID:26025392

  13. Identification of putative regulatory motifs in the upstream regions of co-expressed functional groups of genes in Plasmodium falciparum

    Joshi NV

    2009-01-01

    Full Text Available Abstract Background Regulation of gene expression in Plasmodium falciparum (Pf remains poorly understood. While over half the genes are estimated to be regulated at the transcriptional level, few regulatory motifs and transcription regulators have been found. Results The study seeks to identify putative regulatory motifs in the upstream regions of 13 functional groups of genes expressed in the intraerythrocytic developmental cycle of Pf. Three motif-discovery programs were used for the purpose, and motifs were searched for only on the gene coding strand. Four motifs – the 'G-rich', the 'C-rich', the 'TGTG' and the 'CACA' motifs – were identified, and zero to all four of these occur in the 13 sets of upstream regions. The 'CACA motif' was absent in functional groups expressed during the ring to early trophozoite transition. For functional groups expressed in each transition, the motifs tended to be similar. Upstream motifs in some functional groups showed 'positional conservation' by occurring at similar positions relative to the translational start site (TLS; this increases their significance as regulatory motifs. In the ribonucleotide synthesis, mitochondrial, proteasome and organellar translation machinery genes, G-rich, C-rich, CACA and TGTG motifs, respectively, occur with striking positional conservation. In the organellar translation machinery group, G-rich motifs occur close to the TLS. The same motifs were sometimes identified for multiple functional groups; differences in location and abundance of the motifs appear to ensure different modes of action. Conclusion The identification of positionally conserved over-represented upstream motifs throws light on putative regulatory elements for transcription in Pf.

  14. Rice bZIP protein, REB, interacts with GCN4 motif in promoter of Waxy gene

    CHENG; Shijun; (程世军); WANG; Zongyang(王宗阳); HONG; Mengmin(洪孟民)

    2002-01-01

    A bifactorial endosperm box (EB), which contains an endosperm motif (EM) and a GCN4 motif, was found in rice Wx promoter. EB was found in 5′ upstream region of many seed storage protein genes accounting for these genes expression exclusive in endosperm among various cereals. Many reports demonstrated that the bZIP transcription activators isolated from wheat, barley and maize, etc. regulate the gene expression through binding to the GCN4 motif. In this research, we showed that GCN4 sequence could be recognized by nuclear proteins extracted from immature rice seeds. Furthermore, a rice bZIP protein, REB was isolated by using PCR method and REB fusion protein was expressed in E. coli. The results of gel shift analysis showed that REB could recognize and bind to the GCN4 motif in the Wx gene in addition to binding to the target sequence in the promoter of α-globulin.

  15. Correlating overrepresented upstream motifs to gene expression a computational approach to regulatory element discovery in eukaryotes

    Caselle, M; Provero, P

    2002-01-01

    Gene regulation in eukaryotes is mainly effected through transcription factors binding to rather short recognition motifs generally located upstream of the coding region. We present a novel computational method to identify regulatory elements in the upstream region of eukaryotic genes. The genes are grouped in sets sharing an overrepresented short motif in their upstream sequence. For each set, the average expression level from a microarray experiment is determined: If this level is significantly higher or lower than the average taken over the whole genome, then the overerpresented motif shared by the genes in the set is likely to play a role in their regulation. The method was tested by applying it to the genome of Saccharomyces cerevisiae, using the publicly available results of a DNA microarray experiment, in which expression levels for virtually all the genes were measured during the diauxic shift from fermentation to respiration. Several known motifs were correctly identified, and a new candidate regulat...

  16. TrieAMD: a scalable and efficient apriori motif discovery approach.

    Al-Turaiki, Isra; Badr, Ghada; Mathkour, Hassan

    2015-01-01

    Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori algorithm and use the Trie data structure to discover motifs. We propose several modifications so that we can adapt the classic Apriori to our problem. Experiments are conducted on Tompa's benchmark to investigate the performance of our proposed algorithm, the Trie-based Apriori Motif Discovery (TrieAMD). Results show that our algorithm outperforms all of the tested tools on real datasets for the average sensitivity measure, which means that our approach is able to discover more motifs. In terms of specificity, the performance of our algorithm is comparable to the other tools. The results also confirm both linear time and linear space scalability of the algorithm. PMID:26529905

  17. An intracellular motif of GLUT4 regulates fusion of GLUT4-containing vesicles

    Welsh Gavin I

    2008-05-01

    Full Text Available Abstract Background Insulin stimulates glucose uptake by adipocytes through increasing translocation of the glucose transporter GLUT4 from an intracellular compartment to the plasma membrane. Fusion of GLUT4-containing vesicles at the cell surface is thought to involve phospholipase D activity, generating the signalling lipid phosphatidic acid, although the mechanism of action is not yet clear. Results Here we report the identification of a putative phosphatidic acid-binding motif in a GLUT4 intracellular loop. Mutation of this motif causes a decrease in the insulin-induced exposure of GLUT4 at the cell surface of 3T3-L1 adipocytes via an effect on vesicle fusion. Conclusion The potential phosphatidic acid-binding motif identified in this study is unique to GLUT4 among the sugar transporters, therefore this motif may provide a unique mechanism for regulating insulin-induced translocation by phospholipase D signalling.

  18. The Phe-Phe Motif for Peptide Self-Assembly in Nanomedicine

    Silvia Marchesan; Vargiu, Attilio V.; Katie E. Styan

    2015-01-01

    Since its discovery, the Phe-Phe motif has gained in popularity as a minimalist building block to drive the self-assembly of short peptides and their analogues into nanostructures and hydrogels. Molecules based on the Phe-Phe motif have found a range of applications in nanomedicine, from drug delivery and biomaterials to new therapeutic paradigms. Here we discuss the various production methods for this class of compounds, and the characterization, nanomorphologies, and application of their se...

  19. Decorative motifs in the interior of the town house of the 19th century in Macedonia

    Namicev, Petar; Namiceva, Ekaterina

    2015-01-01

    An integral part of the decoration of the house in Macedonia in the 19th century is, the application of certain stylized motifs in shaping the interior. Based upon the specific material (wood, plaster) gets a certain typology of decorative elements, with partial or full use of the wood or plaster in their representation in the interior. According to the style of decorative motifs include geometric processing, vegetable and zoomorphic decoration. Vegetabe and geometric decoration representing ...

  20. The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.

    Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

    2016-01-01

    The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls. PMID:27489856

  1. Defining and searching for structural motifs using DeepView/Swiss-PdbViewer

    Johansson Maria U; Zoete Vincent; Michielin Olivier; Guex Nicolas

    2012-01-01

    Abstract Background Today, recognition and classification of sequence motifs and protein folds is a mature field, thanks to the availability of numerous comprehensive and easy to use software packages and web-based services. Recognition of structural motifs, by comparison, is less well developed and much less frequently used, possibly due to a lack of easily accessible and easy to use software. Results In this paper, we describe an extension of DeepView/Swiss-PdbViewer through which structura...

  2. A novel zinc-binding motif found in two ubiquitous deaminase families.

    Reizer, J.; Buskirk, S.; Bairoch, A.; Reizer, A.; Saier, M. H.

    1994-01-01

    Two families of deaminases, one specific for cytidine, the other for deoxycytidylate, are shown to possess a novel zinc-binding motif, here designated ZBS. We have (1) identified the protein members of these 2 families, (2) carried out sequence analyses that allow specification of this zinc-binding motif, and (3) determined signature sequences that will allow identification of additional members of these families as their sequences become available. PMID:8061614

  3. Effect of the DEF motif on phosphorylation of peptide substrates by ERK

    Fernandes, Neychelle; Allbritton, Nancy L.

    2009-01-01

    MAP kinase ERK maintains specificity by binding to docking sites such as the DEF domain or D domain. It was previously shown that appending peptides derived from D domains to a substrate peptide increased apparent efficiency of peptide phosphorylation while preserving its apparent specificity for ERK. Here we determine the effect of the DEF motif on efficiency and specificity of peptide phosphorylation by ERK. The DEF motif modulated the apparent affinity of the peptide for ERK while the subs...

  4. A Nucleotide Binding Motif in Hepatitis C Virus (HCV) NS4B Mediates HCV RNA Replication

    Einav, Shirit; Elazar, Menashe; Danieli, Tsafi; Glenn, Jeffrey S.

    2004-01-01

    Hepatitis C virus (HCV) is a major cause of viral hepatitis. There is no effective therapy for most patients. We have identified a nucleotide binding motif (NBM) in one of the virus's nonstructural proteins, NS4B. This structural motif binds and hydrolyzes GTP and is conserved across HCV isolates. Genetically disrupting the NBM impairs GTP binding and hydrolysis and dramatically inhibits HCV RNA replication. These results have exciting implications for the HCV life cycle and novel antiviral s...

  5. An artificial intelligence approach to motif discovery in protein sequences: application to steriod dehydrogenases.

    Bailey, T L; Baker, M E; Elkan, C P

    1997-05-01

    MEME (Multiple Expectation-maximization for Motif Elicitation) is a unique new software tool that uses artificial intelligence techniques to discover motifs shared by a set of protein sequences in a fully automated manner. This paper is the first detailed study of the use of MEME to analyse a large, biologically relevant set of sequences, and to evaluate the sensitivity and accuracy of MEME in identifying structurally important motifs. For this purpose, we chose the short-chain alcohol dehydrogenase superfamily because it is large and phylogenetically diverse, providing a test of how well MEME can work on sequences with low amino acid similarity. Moreover, this dataset contains enzymes of biological importance, and because several enzymes have known X-ray crystallographic structures, we can test the usefulness of MEME for structural analysis. The first six motifs from MEME map onto structurally important alpha-helices and beta-strands on Streptomyces hydrogenans 20beta-hydroxysteroid dehydrogenase. We also describe MAST (Motif Alignment Search Tool), which conveniently uses output from MEME for searching databases such as SWISS-PROT and Genpept. MAST provides statistical measures that permit a rigorous evaluation of the significance of database searches with individual motifs or groups of motifs. A database search of Genpept90 by MAST with the log-odds matrix of the first six motifs obtained from MEME yields a bimodal output, demonstrating the selectivity of MAST. We show for the first time, using primary sequence analysis, that bacterial sugar epimerases are homologs of short-chain dehydrogenases. MEME and MAST will be increasingly useful as genome sequencing provides large datasets of phylogenetically divergent sequences of biomedical interest. PMID:9366496

  6. REVIEW THE JEWELERY OF TURKMEN WOMEN BY EMPHASIZING ON THE CONCEPTS OF MOTIFS IN CULTURE

    Noruzi, Hossein; Kermani, Iman Zakariai

    2015-01-01

    One way to understand the culture of the people is to recognize used concepts in arts and especially their artifacts. These concepts represent in appearances and motifs of these relics. Turkmen are tribes who have used several designs in decorating their arts and crafts. Turkmen women jewelry is considered as remarkable art of this tribe and it not only includes special visual properties but also drawings with broad concepts. The aim of this study is to recognize motifs and the used concepts ...

  7. Extraction of Protein Sequence Motif Information using PSO K-Means

    Gowri, R.; Rathipriya, R.

    2015-01-01

    The main objective of the paper is to find the motif information.The functionalities of the proteins are ideally found from their motif information which is extracted using various techniques like clustering with k-means, hybrid k-means, self-organising maps, etc., in the literature. In this work protein sequence information is extracted using optimised k-means algorithm. The particle swarm optimisation technique is one of the frequently used optimisation method. In the current work the PSO k...

  8. A Nucleotide Binding Motif in Hepatitis C Virus (HCV) NS4B Mediates HCV RNA Replication

    Einav, Shirit; Elazar, Menashe; Danieli, Tsafi; Glenn, Jeffrey S.

    2004-01-01

    Hepatitis C virus (HCV) is a major cause of viral hepatitis. There is no effective therapy for most patients. We have identified a nucleotide binding motif (NBM) in one of the virus's nonstructural proteins, NS4B. This structural motif binds and hydrolyzes GTP and is conserved across HCV isolates. Genetically disrupting the NBM impairs GTP binding and hydrolysis and dramatically inhibits HCV RNA replication. These results have exciting implications for the HCV life cycle and novel antiviral strategies. PMID:15452248

  9. Waddling Random Walk: Fast and Accurate Sampling of Motif Statistics in Large Graphs

    Han, Guyue; Sethu, Harish

    2016-01-01

    The relative frequency of small subgraphs within a large graph, such as one representing an online social network, is of high interest to sociologists, computer scientists and marketeers alike. However, the computation of these network motif statistics via naive enumeration is infeasible for either its prohibitive computational costs or access restrictions on the full graph data. Methods to estimate the motif statistics based on random walks by sampling only a small fraction of the subgraphs ...

  10. Discriminative motif discovery via simulated evolution and random under-sampling.

    Tao Song

    Full Text Available Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  11. Functional characterization of transcription factor motifs using cross-species comparison across large evolutionary distances.

    Kim, Jaebum; Cunningham, Ryan; James, Brian; Wyder, Stefan; Gibson, Joshua D; Niehuis, Oliver; Zdobnov, Evgeny M; Robertson, Hugh M; Robinson, Gene E; Werren, John H; Sinha, Saurabh

    2010-01-01

    We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif-function associations. This framework is applied to the recently sequenced genome of the jewel wasp, Nasonia vitripennis, making use of the existing knowledge of motifs and gene annotations in another insect genome, that of the fruitfly. The framework uses cross-species comparison to improve the specificity of its predictions, and does so without relying upon non-coding sequence alignment. It is therefore well suited for comparative genomics across large evolutionary divergences, where existing alignment-based methods are not applicable. We also apply the framework to find motifs associated with socially regulated gene sets in the honeybee, Apis mellifera, using comparisons with Nasonia, a solitary species, to identify honeybee-specific associations. PMID:20126523

  12. Functional characterization of transcription factor motifs using cross-species comparison across large evolutionary distances.

    Jaebum Kim

    2010-01-01

    Full Text Available We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif-function associations. This framework is applied to the recently sequenced genome of the jewel wasp, Nasonia vitripennis, making use of the existing knowledge of motifs and gene annotations in another insect genome, that of the fruitfly. The framework uses cross-species comparison to improve the specificity of its predictions, and does so without relying upon non-coding sequence alignment. It is therefore well suited for comparative genomics across large evolutionary divergences, where existing alignment-based methods are not applicable. We also apply the framework to find motifs associated with socially regulated gene sets in the honeybee, Apis mellifera, using comparisons with Nasonia, a solitary species, to identify honeybee-specific associations.

  13. Pipeline for the Analysis of ChIP-seq Data and New Motif Ranking Procedure

    Ashoor, Haitham

    2011-06-01

    This thesis presents a computational methodology for ab-initio identification of transcription factor binding sites based on ChIP-seq data. This method consists of three main steps, namely ChIP-seq data processing, motif discovery and models selection. A novel method for ranking the models of motifs identified in this process is proposed. This method combines multiple factors in order to rank the provided candidate motifs. It combines the model coverage of the ChIP-seq fragments that contain motifs from which that model is built, the suitable background data made up of shuffled ChIP-seq fragments, and the p-value that resulted from evaluating the model on actual and background data. Two ChIP-seq datasets retrieved from ENCODE project are used to evaluate and demonstrate the ability of the method to predict correct TFBSs with high precision. The first dataset relates to neuron-restrictive silencer factor, NRSF, while the second one corresponds to growth-associated binding protein, GABP. The pipeline system shows high precision prediction for both datasets, as in both cases the top ranked motif closely resembles the known motifs for the respective transcription factors.

  14. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing.

    Pantazes, Robert J; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N; Murray, Joseph A; Daugherty, Patrick S

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  15. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing

    Pantazes, Robert J.; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N.; Murray, Joseph A.; Daugherty, Patrick S.

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  16. SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions

    Davey, Norman E.; Cowan, Joanne L.; Shields, Denis C.; Gibson, Toby J.; Coldwell, Mark J.; Edwards, Richard J.

    2012-01-01

    Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E. PMID:22977176

  17. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Pooya Zandevakili

    Full Text Available Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  18. How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs

    Alam, Tanvir

    2014-05-29

    LD motifs (leucine-aspartic acidmotifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs. © 2014 Biochemical Society.

  19. APOCALYPTIC MOTIFS IN THE CYCLE OF STORIES BY M.A. BULGAKOV «NOTES OF A YOUNG DOCTOR»

    Evgeniy Igorevich Erokhov

    2015-10-01

    Full Text Available The motif analysis of a cycle of stories by M.A. Bulgakov «Notes of a Young Doctor» from the point of view of their apocalyptic problematics was first performed in this article. To identify apocalyptic motifs the method of motif analysis, developed by B.M. Gasparov, was used which will also help to prove the interpenetration of motifs in the cycle of stories. The result of the research work is the identification of apocalyptic motifs which are manifested in the experiences of the main character and the events taking place around him and passing through the prism of physician’s perception of the world. Our identified motifs show that the stories in the cycle are united not only thematically and with the help of the image of the main character, but with the help of the motifs which reflect interpenetration of apocalyptic motifs in the stories of one cycle. There are the following apocalyptic motifs in the cycle of stories by Bulgakov: diseases, darkness (as part of the landscape, resurrection from the dead and beast. They all belong to the biblical type which is allocated on the basis of the associative bond of these motifs with the biblical texts.

  20. Disparate requirements for the Walker A and B ATPase motifs ofhuman RAD51D in homologous recombination

    Wiese, Claudia; Hinz, John M.; Tebbs, Robert S.; Nham, Peter B.; Urbin, Salustra S.; Collins, David W.; Thompson, Larry H.; Schild, David

    2006-04-21

    In vertebrates, homologous recombinational repair (HRR) requires RAD51 and five RAD51 paralogs (XRCC2, XRCC3, RAD51B, RAD51C, and RAD51D) that all contain conserved Walker A and B ATPase motifs. In human RAD51D we examined the requirement for these motifs in interactions with XRCC2 and RAD51C, and for survival of cells in response to DNA interstrand crosslinks. Ectopic expression of wild type human RAD51D or mutants having a non-functional A or B motif was used to test for complementation of a rad51d knockout hamster CHO cell line. Although A-motif mutants complement very efficiently, B-motif mutants do not. Consistent with these results, experiments using the yeast two- and three-hybrid systems show that the interactions between RAD51D and its XRCC2 and RAD51C partners also require a functional RAD51D B motif, but not motif A. Similarly, hamster Xrcc2 is unable to bind to the non-complementing human RAD51D B-motif mutants in co-immunoprecipitation assays. We conclude that a functional Walker B motif, but not A motif, is necessary for RAD51D's interactions with other paralogs and for efficient HRR. We present a model in which ATPase sites are formed in a bipartite manner between RAD51D and other RAD51 paralogs.

  1. The Geometry of Plasticity-Induced Sensitization in Isoinhibitory Rate Motifs.

    Kumar, Gautam; Ching, ShiNung

    2016-09-01

    A well-known phenomenon in sensory perception is desensitization, wherein behavioral responses to persistent stimuli become attenuated over time. In this letter, our focus is on studying mechanisms through which desensitization may be mediated at the network level and, specifically, how sensitivity changes arise as a function of long-term plasticity. Our principal object of study is a generic isoinhibitory motif: a small excitatory-inhibitory network with recurrent inhibition. Such a motif is of interest due to its overrepresentation in laminar sensory network architectures. Here, we introduce a sensitivity analysis derived from control theory in which we characterize the fixed-energy reachable set of the motif. This set describes the regions of the phase-space that are more easily (in terms of stimulus energy) accessed, thus providing a holistic assessment of sensitivity. We specifically focus on how the geometry of this set changes due to repetitive application of a persistent stimulus. We find that for certain motif dynamics, this geometry contracts along the stimulus orientation while expanding in orthogonal directions. In other words, the motif not only desensitizes to the persistent input, but heightens its responsiveness (sensitizes) to those that are orthogonal. We develop a perturbation analysis that links this sensitization to both plasticity-induced changes in synaptic weights and the intrinsic dynamics of the network, highlighting that the effect is not purely due to weight-dependent disinhibition. Instead, this effect depends on the relative neuronal time constants and the consequent stimulus-induced drift that arises in the motif phase-space. For tightly distributed (but random) parameter ranges, sensitization is quite generic and manifests in larger recurrent E-I networks within which the motif is embedded. PMID:27391684

  2. Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes

    Kistler Corby

    2010-03-01

    Full Text Available Abstract Background Fusarium graminearum (Fg, a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study. Results Applying comparative genomics approaches, we developed a computational pipeline to systematically discover evolutionarily conserved regulatory motifs in the promoter, downstream and the intronic regions of Fg genes, based on the multiple alignments of sequenced Fusarium genomes. Using this method, we discovered 73 candidate regulatory motifs in the promoter regions. Nearly 30% of these motifs are highly enriched in promoter regions of Fg genes that are associated with a specific functional category. Through comparison to Saccharomyces cerevisiae (Sc and Schizosaccharomyces pombe (Sp, we observed conservation of transcription factors (TFs, their binding sites and the target genes regulated by these TFs related to pathways known to respond to stress conditions or phosphate metabolism. In addition, this study revealed 69 and 39 conserved motifs in the downstream regions and the intronic regions, respectively, of Fg genes. The top intronic motif is the splice donor site. For the downstream regions, we noticed an intriguing absence of the mammalian and Sc poly-adenylation signals among the list of conserved motifs. Conclusion This study provides the first comprehensive list of candidate regulatory motifs in Fg, and underscores the power of comparative genomics in revealing functional elements among related genomes. The conservation of regulatory pathways among the Fusarium genomes and the two yeast species reveals their functional significance, and provides new insights in their

  3. Lipid motif of a bacterial antigen mediates immune responses via TLR2 signaling.

    Amit A Lugade

    Full Text Available The cross-talk between the innate and the adaptive immune system is facilitated by the initial interaction of antigen with dendritic cells. As DCs express a large array of TLRs, evidence has accumulated that engagement of these molecules contributes to the activation of adaptive immunity. We have evaluated the immunostimulatory role of the highly-conserved outer membrane lipoprotein P6 from non-typeable Haemophilus influenzae (NTHI to determine whether the presence of the lipid motif plays a critical role on its immunogenicity. We undertook a systematic analysis of the role that the lipid motif plays in the activation of DCs and the subsequent stimulation of antigen-specific T and B cells. To facilitate our studies, recombinant P6 protein that lacked the lipid motif was generated. Mice immunized with non-lipidated rP6 were unable to elicit high titers of anti-P6 Ig. Expression of the lipid motif on P6 was also required for proliferation and cytokine secretion by antigen-specific T cells. Upregulation of T cell costimulatory molecules was abrogated in DCs exposed to non-lipidated rP6 and in TLR2(-/- DCs exposed to native P6, thereby resulting in diminished adaptive immune responses. Absence of either the lipid motif on the antigen or TLR2 expression resulted in diminished cytokine production from stimulated DCs. Collectively, our data suggest that the lipid motif of the lipoprotein antigen is essential for triggering TLR2 signaling and effective stimulation of APCs. Our studies establish the pivotal role of a bacterial lipid motif on activating both innate and adaptive immune responses to an otherwise poorly immunogenic protein antigen.

  4. An Analysis of Multi-type Relational Interactions in FMA Using Graph Motifs with Disjointness Constraints

    Zhang, Guo Qiang; Luo, Lingyun; Ogbuji, Chime; Joslyn, Cliff A.; Mejino, Jose; Sahoo, Satya S.

    2012-11-24

    The interaction of multiple types of relationships among anatomical classes in the Foundational Model of Anatomy (FMA) can provide inferred information valuable for quality assurance. This paper introduces a method called Motif Checking (MOCH) to study the effects of such multi-relation type interactions. MOCH represents patterns of multitype interaction as small labeled sub-graph motifs, whose nodes represent class variables, and labeled edges represent relational types. By representing FMA as an RDF graph and motifs as SPARQL queries, fragments of FMA are automatically obtained as auditing candidates. Leveraging the scalability and reconfigurability of Semantic Web Technology (OWL, RDF and SPARQL) and Virtuoso, we performed exhaustive analyses of three 2-node motifs, resulting in 638 matching FMA configurations; twelve 3-node motifs, resulting in 202,960 configurations. Using the Principal Ideal Explorer (PIE) methodology as an extension of MOCH, we were able to identify 755 root nodes with 4,100 respective descendants with opposing antonyms in their class names for arbitrary-length motifs. With possible disjointness implied by antonyms, we performed manual inspection of a subset of the resulting FMA fragments and tracked down a source of abnormal inferred conclusions (captured by the motifs), coming from a gender-neutral class being modeled as a part of gender-specific class, such as “Urinary system” is a part of “Female human body.” Our results demonstrate that MOCH and PIE provide a unique source of valuable information for quality assurance. Since our approach is general, it is applicable to any ontological system with an OWL representation.

  5. V1R promoters are well conserved and exhibit common putative regulatory motifs

    Lane Robert P

    2007-07-01

    Full Text Available Abstract Background The mouse vomeronasal organ (VNO processes chemosensory information, including pheromone signals that influence reproductive behaviors. The sensory neurons of the VNO express two types of chemosensory receptors, V1R and V2R. There are ~165 V1R genes in the mouse genome that have been classified into ~12 divergent subfamilies. Each sensory neuron of the apical compartment of the VNO transcribes only one of the repertoire of V1R genes. A model for mutually exclusive V1R transcription in these cells has been proposed in which each V1R gene might compete stochastically for a single transcriptional complex. This model predicts that the large repertoire of divergent V1R genes in the mouse genome contains common regulatory elements. In this study, we have characterized V1R promoter regions by comparative genomics and by mapping transcription start sites. Results We find that transcription is initiated from ~1 kb promoter regions that are well conserved within V1R subfamilies. While cross-subfamily homology is not evident by traditional methods, we developed a heuristic motif-searching tool, LogoAlign, and applied this tool to identify motifs shared within the promoters of all V1R genes. Our motif-searching tool exhibits rapid convergence to a relatively small number of non-redundant solutions (97% convergence. We also find that the best motifs contain significantly more information than those identified in controls, and that these motifs are more likely to be found in the immediate vicinity of transcription start sites than elsewhere in gene blocks. The best motifs occur near transcription start sites of ~90% of all V1R genes and across all of the divergent subfamilies. Therefore, these motifs are candidate binding sites for transcription factors involved in V1R co-regulation. Conclusion Our analyses show that V1R subfamilies have broad and well conserved promoter regions from which transcription is initiated. Results from a new

  6. FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web.

    Shapiro, Jessica; Brutlag, Douglas

    2004-07-01

    The FoldMiner web server (http://foldminer.stanford.edu/) provides remote access to methods for protein structure alignment and unsupervised motif discovery. FoldMiner is unique among such algorithms in that it improves both the motif definition and the sensitivity of a structural similarity search by combining the search and motif discovery methods and using information from each process to enhance the other. In a typical run, a query structure is aligned to all structures in one of several databases of single domain targets in order to identify its structural neighbors and to discover a motif that is the basis for the similarity among the query and statistically significant targets. This process is fully automated, but options for manual refinement of the results are available as well. The server uses the Chime plugin and customized controls to allow for visualization of the motif and of structural superpositions. In addition, we provide an interface to the LOCK 2 algorithm for rapid alignments of a query structure to smaller numbers of user-specified targets. PMID:15215444

  7. DXD Motif-Dependent and -Independent Effects of the Chlamydia trachomatis Cytotoxin CT166

    Miriam Bothe

    2015-02-01

    Full Text Available The Gram-negative, intracellular bacterium Chlamydia trachomatis causes acute and chronic urogenital tract infection, potentially leading to infertility and ectopic pregnancy. The only partially characterized cytotoxin CT166 of serovar D exhibits a DXD motif, which is important for the enzymatic activity of many bacterial and mammalian type A glycosyltransferases, leading to the hypothesis that CT166 possess glycosyltransferase activity. CT166-expressing HeLa cells exhibit actin reorganization, including cell rounding, which has been attributed to the inhibition of the Rho-GTPases Rac/Cdc42. Exploiting the glycosylation-sensitive Ras(27H5 antibody, we here show that CT166 induces an epitope change in Ras, resulting in inhibited ERK and PI3K signaling and delayed cell cycle progression. Consistent with the hypothesis that these effects strictly depend on the DXD motif, CT166 with the mutated DXD motif causes neither Ras-ERK inhibition nor delayed cell cycle progression. In contrast, CT166 with the mutated DXD motif is still capable of inhibiting cell migration, suggesting that CT166 with the mutated DXD motif cannot be regarded as inactive in any case. Taken together, CT166 affects various fundamental cellular processes, strongly suggesting its importance for the intracellular survival of chlamydia.

  8. Importance of NPA motifs in the expression and function of water channel aquaporin-1

    JIANG Yong; MA TongHui

    2007-01-01

    The asparagine-proline-alanine sequences (NPA motifs) are highly conserved in aquaporin water channel family. Crystallographic studies of AQP1 structure demonstrated that the two NPA motifs are in the narrow central constriction of the channel, serving to bind water molecules for selective and efficient water passage. To investigate the importance of the two NPA motifs in the structure, function and biogenesis of aquaporin water channels, we generated AQP1 mutations with NPA1 deletion, NPA2 deletion and NPA1,2 double deletion. The coding sequences of the three mutated cDNAs were subcloned into the mammalian expression vector pcDNA3.1 to form expression plasmids. We established stably transfected CHO cell lines expressing these AQP1 mutants. Immunofluorescence indicated that all the three mutated AQP1 proteins are expressed normally on the plasma membrane of stably transfected CHO cells, suggesting that deletion of NPA motifs does not influence the expression and intracellular processing of AQP1. Functional analysis demonstrated that NPA1 or NPA2 deletion reduced AQP1 water permeability by 49.6% and 46.7%, respectively, while NPA1,2 double deletion had little effect on AQP1 water permeability. These results provide evidence that NPA motifs are important for water per-meation but not essential for the expression, intracellular processing and the basic structure of AQP1 water channel.

  9. Designing synthetic RNAs to determine the relevance of structural motifs in picornavirus IRES elements

    Fernandez-Chamorro, Javier; Lozano, Gloria; Garcia-Martin, Juan Antonio; Ramajo, Jorge; Dotu, Ivan; Clote, Peter; Martinez-Salas, Encarnacion

    2016-04-01

    The function of Internal Ribosome Entry Site (IRES) elements is intimately linked to their RNA structure. Viral IRES elements are organized in modular domains consisting of one or more stem-loops that harbor conserved RNA motifs critical for internal initiation of translation. A conserved motif is the pyrimidine-tract located upstream of the functional initiation codon in type I and II picornavirus IRES. By computationally designing synthetic RNAs to fold into a structure that sequesters the polypyrimidine tract in a hairpin, we establish a correlation between predicted inaccessibility of the pyrimidine tract and IRES activity, as determined in both in vitro and in vivo systems. Our data supports the hypothesis that structural sequestration of the pyrimidine-tract within a stable hairpin inactivates IRES activity, since the stronger the stability of the hairpin the higher the inhibition of protein synthesis. Destabilization of the stem-loop immediately upstream of the pyrimidine-tract also decreases IRES activity. Our work introduces a hybrid computational/experimental method to determine the importance of structural motifs for biological function. Specifically, we show the feasibility of using the software RNAiFold to design synthetic RNAs with particular sequence and structural motifs that permit subsequent experimental determination of the importance of such motifs for biological function.

  10. A Simple Decision Rule for Recognition of Poly(A) Tail Signal Motifs in Human Genome

    Eisha, Hassan Abou

    2015-05-12

    Background is the numerous attempts were made to predict motifs in genomic sequences that correspond to poly (A) tail signals. Vast portion of this effort has been directed to a plethora of nonlinear classification methods. Even when such approaches yield good discriminant results, identifying dominant features of regulatory mechanisms nevertheless remains a challenge. In this work, we look at decision rules that may help identifying such features. Findings are we present a simple decision rule for classification of candidate poly (A) tail signal motifs in human genomic sequence obtained by evaluating features during the construction of gradient boosted trees. We found that values of a single feature based on the frequency of adenine in the genomic sequence surrounding candidate signal and the number of consecutive adenine molecules in a well-defined region immediately following the motif displays good discriminative potential in classification of poly (A) tail motifs for samples covered by the rule. Conclusions is the resulting simple rule can be used as an efficient filter in construction of more complex poly(A) tail motifs classification algorithms.