WorldWideScience

Sample records for functional sequence motifs

  1. Detecting correlations among functional-sequence motifs

    Science.gov (United States)

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.

  2. Structural conservation of a short, functional, peptide-sequence motif

    OpenAIRE

    Fox-Erlich, Susan; Schiller, Martin R; Gryk, Michael R.

    2009-01-01

    Full length, eukaryotic proteins generally consist of several autonomously folding and functioning domains. Many of these domains are known to function by binding and/or modifying other partner proteins based on the recognition of a short, linear amino sequence contained within the target protein. This article reviews the many bioinformatic tools and resources which discover, define and catalogue the various, known protein domains as well as assist users by identifying domain signatures withi...

  3. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions.

    Science.gov (United States)

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers.

  4. A structural study for the optimisation of functional motifs encoded in protein sequences

    Directory of Open Access Journals (Sweden)

    Helmer-Citterich Manuela

    2004-04-01

    Full Text Available Abstract Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases, the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of

  5. Network motifs in music sequences

    CERN Document Server

    Zanette, Damian H

    2010-01-01

    In this note, I summarize ongoing research on motif distribution in networks built up out of symbolic sequences of Western musical origin. Their motif significance profiles exhibit remarkable consistency over different styles and periods, and define a class that cannot be identified with any of the four "superfamilies" to which most real networks seem to belong. Networks from music sequences possess an unusual abundance of bidirectional connections, due to the inherent reversibility of short musical note patterns. This property contributes to motif significance from both local and large-scale features of musical structure.

  6. seeMotif: exploring and visualizing sequence motifs in 3D structures

    OpenAIRE

    2009-01-01

    Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D st...

  7. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  8. seeMotif: exploring and visualizing sequence motifs in 3D structures

    Science.gov (United States)

    Chang, Darby Tien-Hao; Chien, Ting-Ying; Chen, Chien-Yu

    2009-01-01

    Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D structures that have also been accumulated at an astounding rate in recent years. This article reports the development of the web service seeMotif, which provides users with an interactive interface for visualizing sequence motifs on protein structures from the Protein Data Bank (PDB). Researchers can quickly see the locations and conformation of multiple motifs among a number of related structures simultaneously. Considering the fact that PDB sequences are usually shorter than those in sequence databases and/or may have missing residues, seeMotif has two complementary approaches for selecting structures and mapping motifs to protein chains in structures. As more and more structures belonging to previously uncharacterized protein families become available, combining sequence and structure information gives good opportunities to facilitate understanding of protein functions in large-scale genome projects. Available at: http://seemotif.csie.ntu.edu.tw,http://seemotif.ee.ncku.edu.tw or http://seemotif.csbb.ntu.edu.tw. PMID:19477961

  9. seeMotif: exploring and visualizing sequence motifs in 3D structures.

    Science.gov (United States)

    Chang, Darby Tien-Hao; Chien, Ting-Ying; Chen, Chien-Yu

    2009-07-01

    Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D structures that have also been accumulated at an astounding rate in recent years. This article reports the development of the web service seeMotif, which provides users with an interactive interface for visualizing sequence motifs on protein structures from the Protein Data Bank (PDB). Researchers can quickly see the locations and conformation of multiple motifs among a number of related structures simultaneously. Considering the fact that PDB sequences are usually shorter than those in sequence databases and/or may have missing residues, seeMotif has two complementary approaches for selecting structures and mapping motifs to protein chains in structures. As more and more structures belonging to previously uncharacterized protein families become available, combining sequence and structure information gives good opportunities to facilitate understanding of protein functions in large-scale genome projects. Available at: http://seemotif.csie.ntu.edu.tw,http://seemotif.ee.ncku.edu.tw or http://seemotif.csbb.ntu.edu.tw.

  10. Discovering novel sequence motifs with MEME.

    Science.gov (United States)

    Bailey, Timothy L

    2002-11-01

    This unit illustrates how to use MEME to discover motifs in a group of related nucleotide or peptide sequences. A MEME motif is a sequence pattern that occurs repeatedly in one or more sequences in the input group. MEME can be used to discover novel patterns because it bases its discoveries only on the input sequences, not on any prior knowledge (such as databases of known motifs). The input to MEME is a set of unaligned sequences of the same type (peptide or nucleotide). For each motif it discovers, MEME reports the occurrences (sites), consensus sequence, and the level of conservation (information content) at each position in the pattern. MEME also produces block diagrams showing where all of the discovered motifs occur in the training set sequences. MEME's hypertext (HTML) output also contains buttons that allow for the convenient use of the motifs in other searches.

  11. Parametric bootstrapping for biological sequence motifs.

    Science.gov (United States)

    O'Neill, Patrick K; Erill, Ivan

    2016-10-06

    Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively. We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif's positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators. Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics

  12. Functional importance of GGXG sequence motifs in putative reentrant loops of 2HCT and ESS transport proteins.

    Science.gov (United States)

    Dobrowolski, Adam; Lolkema, Juke S

    2009-08-11

    The 2HCT and ESS families are two families of secondary transporters. Members of the two families are unrelated in amino acid sequence but share similar hydropathy profiles, which suggest a similar folding of the proteins in membranes. Structural models show two homologous domains containing five transmembrane segments (TMSs) each, with a reentrant or pore loop between the fourth and fifth TMSs in each domain. Here we show that GGXG sequence motifs present in the putative reentrant loops are important for the activity of the transporters. Mutation of the conserved Gly residues to Cys in the motifs of the Na(+)-citrate transporter CitS in the 2HCT family and the Na(+)-glutamate transporter GltS in the ESS family resulted in strongly reduced transport activity. Similarly, mutation of the variable residue "X" to Cys in the N-terminal half of GltS essentially inactivated the transporter. The corresponding mutations in the N- and C-terminal halves of CitS reduced transport activity to 60 and 25% of that of the wild type, respectively. Residual activity of any of the mutants could be further reduced by treatment with the membrane permeable thiol reagent N-ethylmaleimide (NEM). The X to Cys mutation (S405C) in the cytoplasmic loop in the C-terminal half of CitS rendered the protein sensitive to the bulky, membrane impermeable thiol reagent 4-acetamido-4'-maleimidylstilbene-2,2'-disulfonic acid (AmdiS) added at the periplasmic side of the membrane, providing further evidence that this part of the loop is positioned between the transmembrane segments. The putative reentrant loop in the C-terminal half of the ESS family does not contain the GGXG motif, but a conserved stretch rich in Gly residues. Cysteine-scanning mutagenesis of a stretch of 18 residues in the GltS protein revealed two residues important for function. Mutant N356C was completely inactivated by treatment with NEM, and mutant P351C appeared to be the counterpart of mutant S405C of CitS; the mutant was

  13. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    Science.gov (United States)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  14. Probabilistic models for semisupervised discriminative motif discovery in DNA sequences.

    Science.gov (United States)

    Kim, Jong Kyoung; Choi, Seungjin

    2011-01-01

    Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.

  15. Motif Yggdrasil: sampling sequence motifs from a tree mixture model.

    Science.gov (United States)

    Andersson, Samuel A; Lagergren, Jens

    2007-06-01

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.

  16. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  17. Peptide sequences identified by phage display are immunodominant functional motifs of Pet and Pic serine proteases secreted by Escherichia coli and Shigella flexneri.

    Science.gov (United States)

    Ulises, Hernández-Chiñas; Tatiana, Gazarian; Karlen, Gazarian; Guillermo, Mendoza-Hernández; Juan, Xicohtencatl-Cortes; Carlos, Eslava

    2009-12-01

    Plasmid-encoded toxin (Pet) and protein involved in colonization (Pic), are serine protease autotransporters of Enterobacteriaceae (SPATEs) secreted by enteroaggregative Escherichia coli (EAEC), which display the GDSGSG sequence or the serine motif. Our research was directed to localize functional sites in both proteins using the phage display method. From a 12mer linear and a 7mer cysteine-constrained (C7C) libraries displayed on the M13 phage pIII protein we selected different mimotopes using IgG purified from sera of children naturally infected with EAEC producing Pet and Pic proteins, and anti-Pet and anti-Pic IgG purified from rabbits immunized with each one of these proteins. Children IgG selected a homologous group of sequences forming the consensus sequence, motif, PQPxK, and the motifs PGxI/LN and CxPDDSSxC were selected by the rabbit anti-Pet and anti-Pic IgGs, respectively. Analysis of the amino terminal region of a panel of SPATEs showed the presence in all of them of sequences matching the PGxI/LN or CxPDDSSxC motifs, and in a three-dimensional model (Modeller 9v2) designed for Pet, both these motifs were found in the globular portion of the protein, close to the protease active site GDSGSG. Antibodies induced in mice by mimotopes carrying the three aforementioned motifs were reactive with Pet, Pic, and with synthetic peptides carrying the immunogenic mimotope sequences TYPGYINHSKA and LLPQPPKLLLP, thus confirming that the peptide moiety of the selected phages induced the antibodies specific for the toxins. The antibodies induced in mice to the PGxI/LN and CxPDDSSxC mimotopes inhibited fodrin proteolysis and macrophage chemotaxis biological activities of Pet. Our results showed that we were able to generate, by a phage display procedure, mimotopes with sequence motifs PGxI/LN and CxPDDSSxC, and to identify them as functional motifs of the Pet, Pic and other SPATEs involved in their biological activities.

  18. Sublinear Time Motif Discovery from Multiple Sequences

    Directory of Open Access Journals (Sweden)

    Yunhui Fu

    2013-10-01

    Full Text Available In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet, Σ. A motif G = g1g2 ... gm is a string of m characters. In each background sequence is implanted a probabilistically-generated approximate copy of G. For a probabilistically-generated approximate copy b1b2 ... bm of G, every character, bi, is probabilistically generated, such that the probability for bi ≠ gi is at most α. We develop two new randomized algorithms and one new deterministic algorithm. They make advancements in the following aspects: (1 The algorithms are much faster than those before. Our algorithms can even run in sublinear time. (2 They can handle any motif pattern. (3 The restriction for the alphabet size is a lower bound of four. This gives them potential applications in practical problems, since gene sequences have an alphabet size of four. (4 All algorithms have rigorous proofs about their performances. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other software.

  19. Detecting Motifs in System Call Sequences

    CERN Document Server

    Wilson, William O; Aickelin, Uwe

    2010-01-01

    The search for patterns or motifs in data represents an area of key interest to many researchers. In this paper we present the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs which repeat within time series data. The power of the algorithm is derived from its use of a small number of parameters with minimal assumptions. The algorithm searches from a completely neutral perspective that is independent of the data being analysed, and the underlying motifs. In this paper the motif tracking algorithm is applied to the search for patterns within sequences of low level system calls between the Linux kernel and the operating system's user space. The MTA is able to compress data found in large system call data sets to a limited number of motifs which summarise that data. The motifs provide a resource from which a profile of executed processes can be built. The potential for these profiles and new implications for security research are highlighted. A...

  20. Sequence Length Limits for Controlling False Positives in Discovering Nucleotide Sequence Motifs

    Institute of Scientific and Technical Information of China (English)

    CHEN Lei; QiAN Zi-liang

    2008-01-01

    In the study of motif discovery, especially the transcription factor DNA binding sites discovery, a too long input sequence would return non-informative motifs rather than those biological functional motifs. This paper gave theoretical analyses and computational experiments to suggest the length limits of the input sequence. When the sequence length exceeds a certain critical point, the probability of discovering the motif decreases sharply. The work not only gave an explanation on the unsatisfying results of the existed motif discovery problems that the input sequence length might be too long and exceed the point, but also provided an estimation of input sequence length we should accept to get more meaningful and reliable results in motif discovery.

  1. The functional glycosyltransferase signature sequence of the human beta 1,3-glucuronosyltransferase is a XDD motif.

    Science.gov (United States)

    Gulberti, Sandrine; Fournel-Gigleux, Sylvie; Mulliert, Guillermo; Aubry, André; Netter, Patrick; Magdalou, Jacques; Ouzzine, Mohamed

    2003-08-22

    The human beta 1,3-glucuronosyltransferase I (GlcAT-I) is the key enzyme responsible for the completion of glycosaminoglycan-protein linkage tetrasaccharide of proteoglycans (GlcA beta 1,3Gal beta 1,3Gal beta 1,4Xyl beta 1-O-serine). We have investigated the role of aspartate residues Asp194-Asp195-Asp196 corresponding to the glycosyltransferase DXD signature motif, in GlcAT-I function by UDP binding experiments, kinetic analyses, and site-directed mutagenesis. We presented the first evidence that Mn2+ is not only essential for GlcAT-I activity but is also required for cosubstrate binding. In agreement, kinetic studies were consistent with a metal-activated enzyme model whereby activation probably occurs via binding of a Mn2+.UDP-GlcA complex to the enzyme. Mutational analysis showed that the Asp194-Asp195-Asp196 motif is a major element of the UDP/Mn2+ binding site. Furthermore, determination of the individual role of each aspartate showed that substitution of Asp195 as well as Asp196 to alanine strongly impaired GlcAT-I activity, whereas Asp194 replacement produced only a moderate alteration of the enzyme activity. These findings along with molecular modeling and three-dimensional structure comparison of the GlcAT-I catalytic center with that of the Bacillus subtilis glycosyltransferase SpsA provided evidence that the interactions of Asp195 with the ribose moiety of UDP and of Asp196 with the metal cation Mn2+ were crucial for GlcAT-I function. Altogether, these results indicated that, similarly to the SpsA enzyme, the nucleotide binding site of GlcAT-I contains a XDD motif rather than a DXD motif.

  2. Exploitation of peptide motif sequences and their use in nanobiotechnology.

    Science.gov (United States)

    Shiba, Kiyotaka

    2010-08-01

    Short amino acid sequences extracted from natural proteins or created using in vitro evolution systems are sometimes associated with particular biological functions. These peptides, called peptide motifs, can serve as functional units for the creation of various tools for nanobiotechnology. In particular, peptide motifs that have the ability to specifically recognize the surfaces of solid materials and to mineralize certain inorganic materials have been linking biological science to material science. Here, I review how these peptide motifs have been isolated from natural proteins or created using in vitro evolution systems, and how they have been used in the nanobiotechnology field.

  3. Functional characterization of variations on regulatory motifs.

    Directory of Open Access Journals (Sweden)

    Michal Lapidot

    2008-03-01

    Full Text Available Transcription factors (TFs regulate gene expression through specific interactions with short promoter elements. The same regulatory protein may recognize a variety of related sequences. Moreover, once they are detected it is hard to predict whether highly similar sequence motifs will be recognized by the same TF and regulate similar gene expression patterns, or serve as binding sites for distinct regulatory factors. We developed computational measures to assess the functional implications of variations on regulatory motifs and to compare the functions of related sites. We have developed computational means for estimating the functional outcome of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. We predict the effects of nucleotide variations within motifs on gene expression patterns. In cases where such predictions could be compared to suitable published experimental evidence, we found very good agreement. We further accumulated statistics from multiple substitutions across various binding sites in an attempt to deduce general properties that characterize nucleotide substitutions that are more likely to alter expression. We found that substitutions involving Adenine are more likely to retain the expression pattern and that substitutions involving Guanine are more likely to alter expression compared to the rest of the substitutions. Our results should facilitate the prediction of the expression outcomes of binding site variations. One typical important implication is expected to be the ability to predict the phenotypic effect of variation in regulatory motifs in promoters.

  4. Alignment of capsid protein VP1 sequences of all human rhinovirus prototype strains: conserved motifs and functional domains.

    Science.gov (United States)

    Laine, Pia; Blomqvist, Soile; Savolainen, Carita; Andries, Koen; Hovi, Tapani

    2006-01-01

    An alignment was made of the deduced amino acid sequences of the entire capsid protein VP1 of all human rhinovirus (HRV) prototype strains to examine conserved motifs in the primary structure. A set of previously proposed crucially important amino acids in the footprints of the two known receptor molecules was not conserved in a receptor group-specific way. In contrast, VP1 and VP3 amino acids in the minor receptor-group strains corresponding to most of the predicted ICAM-1 footprint definitely differed from those of the ICAM-1-using major receptor-group strains. Previous antiviral-sensitivity classification showed an almost-complete agreement with the species classification and a fair correlation with amino acids aligning in the antiviral pocket. It was concluded that systematic alignment of sequences of related virus strains can be used to test hypotheses derived from molecular studies of individual model viruses and to generate ideas for future studies on virus structure and replication.

  5. Discovering motifs in ranked lists of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Eran Eden

    2007-03-01

    Full Text Available Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray measurements. Several major challenges in sequence motif discovery still require consideration: (i the need for a principled approach to partitioning the data into target and background sets; (ii the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii the need for an appropriate framework for accounting for motif multiplicity; (iv the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs, which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i Identification of 50 novel putative transcription factor (TF binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked

  6. The distribution of RNA motifs in natural sequences.

    Science.gov (United States)

    Bourdeau, V; Ferbeyre, G; Pageau, M; Paquin, B; Cedergren, R

    1999-11-15

    Functional analysis of genome sequences has largely ignored RNA genes and their structures. We introduce here the notion of 'ribonomics' to describe the search for the distribution of and eventually the determination of the physiological roles of these RNA structures found in the sequence databases. The utility of this approach is illustrated here by the identification in the GenBank database of RNA motifs having known binding or chemical activity. The frequency of these motifs indicates that most have originated from evolutionary drift and are selectively neutral. On the other hand, their distribution among species and their location within genes suggest that the destiny of these motifs may be more elaborate. For example, the hammerhead motif has a skewed organismal presence, is phylogenetically stable and recent work on a schistosome version confirms its in vivo biological activity. The under-representation of the valine-binding motif and the Rev-binding element in GenBank hints at a detrimental effect on cell growth or viability. Data on the presence and the location of these motifs may provide critical guidance in the design of experiments directed towards the understanding and the manipulation of RNA complexes and activities in vivo.

  7. rMotifGen: random motif generator for DNA and protein sequences

    Directory of Open Access Journals (Sweden)

    Hardin C Timothy

    2007-08-01

    Full Text Available Abstract Background Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM. Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Results Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. Conclusion rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.

  8. Protein functional-group 3D motif and its applications

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Representing and recognizing protein active sites sequence motif (1D motif) and structural motif (3D motif) is an important topic for predicting and designing protein function. Prevalent methods for extracting and searching 3D motif always consider residue as the minimal unit, which have limited sensitivity. Here we present a new spatial representation of protein active sites, called "functional-group 3D motif ", based on the fact that the functional groups inside a residue contribute mostly to its function. Relevant algorithm and computer program are developed, which could be widely used in the function prediction and the study of structural-function relationship of proteins. As a test, we defined a functional-group 3D motif of the catalytic triad and oxyanion hole with the structure of porcine trypsin (PDB code: 1mct) as the template. With our motif-searching program, we successfully found similar sub-structures in trypsins, subtilisins and a/b hydrolases, which show distinct folds but share similar catalytic mechanism. Moreover, this motif can be used to elucidate the structural basis of other proteins with variant catalytic triads by comparing it to those proteins. Finally, we scanned this motif against a non-redundant protein structure database to find its matches, and the results demonstrated the potential application of functional group 3D motif in function prediction. Above all, compared with the other 3D-motif representations on residues, the functional group 3D motif achieves better representation of protein active region, which is more sensitive for protein function prediction.

  9. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  10. The mammalian Rab family of small GTPases: definition of family and subfamily sequence motifs suggests a mechanism for functional specificity in the Ras superfamily.

    Science.gov (United States)

    Pereira-Leal, J B; Seabra, M C

    2000-08-25

    The Rab/Ypt/Sec4 family forms the largest branch of the Ras superfamily of GTPases, acting as essential regulators of vesicular transport pathways. We used the large amount of information in the databases to analyse the mammalian Rab family. We defined Rab-conserved sequences that we designate Rab family (RabF) motifs using the conserved PM and G motifs as "landmarks". The Rab-specific regions were used to identify new Rab proteins in the databases and suggest rules for nomenclature. Surprisingly, we find that RabF regions cluster in and around switch I and switch II regions, i.e. the regions that change conformation upon GDP or GTP binding. This finding suggests that specificity of Rab-effector interaction cannot be conferred solely through the switch regions as is usually inferred. Instead, we propose a model whereby an effector binds to RabF (switch) regions to discriminate between nucleotide-bound states and simultaneously to other regions that confer specificity to the interaction, possibly Rab subfamily (RabSF) specific regions that we also define here. We discuss structural and functional data that support this model and its general applicability to the Ras superfamily of proteins.

  11. A discriminative approach for unsupervised clustering of DNA sequence motifs.

    Directory of Open Access Journals (Sweden)

    Philip Stegmaier

    Full Text Available Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.

  12. Motif Discovery in Tissue-Specific Regulatory Sequences Using Directed Information

    Directory of Open Access Journals (Sweden)

    States David

    2007-01-01

    Full Text Available Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often leads to discovery of novel motifs (including transcription factor sites with previously uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work, we present an approach to the identification of motifs (not necessarily transcription factor sites and examine its application to some questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific gene promoter or regulatory regions from those that are not tissue-specific. There are two main contributions of this work. Firstly, we propose the use of directed information for such classification constrained motif discovery, and then use the selected features with a support vector machine (SVM classifier to find the tissue specificity of any sequence of interest. Such analysis yields several novel interesting motifs that merit further experimental characterization. Furthermore, this approach leads to a principled framework for the prospective examination of any chosen motif to be discriminatory motif for a group of coexpressed/coregulated genes, thereby integrating sequence and expression perspectives. We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue-specific regulatory role of any conserved sequence element identified from genome-wide studies.

  13. Identification of protein superfamily from structure- based sequence motif

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The structure-based sequence motif of the distant proteins in evolution, protein tyrosine phosphatases (PTP) Ⅰ and Ⅱ superfamilies, as an example, has been defined by the structural comparison, structure-based sequence alignment and analyses on substitution patterns of residues in common sequence conserved regions. And the phosphatases Ⅰ and Ⅱ can be correctly identified together by the structure-based PTP sequence motif from SWISS-PROT and TrEBML databases. The results show that the correct rates of identification are over 98%. This is the first time to identify PTP Ⅰ and Ⅱ together by this motif.

  14. MEME: discovering and analyzing DNA and protein sequence motifs.

    Science.gov (United States)

    Bailey, Timothy L; Williams, Nadya; Misleh, Chris; Li, Wilfred W

    2006-07-01

    MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel 'signals' in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. Users can perform MEME searches via the web server hosted by the National Biomedical Computation Resource (http://meme.nbcr.net) and several mirror sites. Through the same web server, users can also access the Motif Alignment and Search Tool to search sequence databases for matches to motifs encoded in several popular formats. By clicking on buttons in the MEME output, users can compare the motifs discovered in their input sequences with databases of known motifs, search sequence databases for matches to the motifs and display the motifs in various formats. This article describes the freely accessible web server and its architecture, and discusses ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance.

  15. Identification of sequence motifs significantly associated with antisense activity

    Directory of Open Access Journals (Sweden)

    Peek Andrew S

    2007-06-01

    Full Text Available Abstract Background Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. Results We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. Conclusion The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic

  16. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    , selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes...... and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms...

  17. WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches.

    Science.gov (United States)

    Romer, Katherine A; Kayombya, Guy-Richard; Fraenkel, Ernest

    2007-07-01

    WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs.

  18. Mitoxantrone and Analogues Bind and Stabilize i-Motif Forming DNA Sequences

    Science.gov (United States)

    Wright, Elisé P.; Day, Henry A.; Ibrahim, Ali M.; Kumar, Jeethendra; Boswell, Leo J. E.; Huguin, Camille; Stevenson, Clare E. M.; Pors, Klaus; Waller, Zoë A. E.

    2016-12-01

    There are hundreds of ligands which can interact with G-quadruplex DNA, yet very few which target i-motif. To appreciate an understanding between the dynamics between these structures and how they can be affected by intervention with small molecule ligands, more i-motif binding compounds are required. Herein we describe how the drug mitoxantrone can bind, induce folding of and stabilise i-motif forming DNA sequences, even at physiological pH. Additionally, mitoxantrone was found to bind i-motif forming sequences preferentially over double helical DNA. We also describe the stabilisation properties of analogues of mitoxantrone. This offers a new family of ligands with potential for use in experiments into the structure and function of i-motif forming DNA sequences.

  19. Phosphotyrosine Substrate Sequence Motifs for Dual Specificity Phosphatases.

    Directory of Open Access Journals (Sweden)

    Bryan M Zhao

    Full Text Available Protein tyrosine phosphatases dephosphorylate tyrosine residues of proteins, whereas, dual specificity phosphatases (DUSPs are a subgroup of protein tyrosine phosphatases that dephosphorylate not only Tyr(P residue, but also the Ser(P and Thr(P residues of proteins. The DUSPs are linked to the regulation of many cellular functions and signaling pathways. Though many cellular targets of DUSPs are known, the relationship between catalytic activity and substrate specificity is poorly defined. We investigated the interactions of peptide substrates with select DUSPs of four types: MAP kinases (DUSP1 and DUSP7, atypical (DUSP3, DUSP14, DUSP22 and DUSP27, viral (variola VH1, and Cdc25 (A-C. Phosphatase recognition sites were experimentally determined by measuring dephosphorylation of 6,218 microarrayed Tyr(P peptides representing confirmed and theoretical phosphorylation motifs from the cellular proteome. A broad continuum of dephosphorylation was observed across the microarrayed peptide substrates for all phosphatases, suggesting a complex relationship between substrate sequence recognition and optimal activity. Further analysis of peptide dephosphorylation by hierarchical clustering indicated that DUSPs could be organized by substrate sequence motifs, and peptide-specificities by phylogenetic relationships among the catalytic domains. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential targets of DUSPs. These newly identified DUSP substrates will be important for examining structure-activity relationships with physiologically relevant targets.

  20. Identification of imine reductase-specific sequence motifs.

    Science.gov (United States)

    Fademrecht, Silvia; Scheller, Philipp N; Nestl, Bettina M; Hauer, Bernhard; Pleiss, Jürgen

    2016-05-01

    Chiral amines are valuable building blocks for the production of a variety of pharmaceuticals, agrochemicals and other specialty chemicals. Only recently, imine reductases (IREDs) were discovered which catalyze the stereoselective reduction of imines to chiral amines. Although several IREDs were biochemically characterized in the last few years, knowledge of the reaction mechanism and the molecular basis of substrate specificity and stereoselectivity is limited. To gain further insights into the sequence-function relationships, the Imine Reductase Engineering Database (www.IRED.BioCatNet.de) was established and a systematic analysis of 530 putative IREDs was performed. A standard numbering scheme based on R-IRED-Sk was introduced to facilitate the identification and communication of structurally equivalent positions in different proteins. A conservation analysis revealed a highly conserved cofactor binding region and a predominantly hydrophobic substrate binding cleft. Two IRED-specific motifs were identified, the cofactor binding motif GLGxMGx(5 )[ATS]x(4) Gx(4) [VIL]WNR[TS]x(2) [KR] and the active site motif Gx[DE]x[GDA]x[APS]x(3){K}x[ASL]x[LMVIAG]. Our results indicate a preference toward NADPH for all IREDs and explain why, despite their sequence similarity to β-hydroxyacid dehydrogenases (β-HADs), no conversion of β-hydroxyacids has been observed. Superfamily-specific conservations were investigated to explore the molecular basis of their stereopreference. Based on our analysis and previous experimental results on IRED mutants, an exclusive role of standard position 187 for stereoselectivity is excluded. Alternatively, two standard positions 139 and 194 were identified which are superfamily-specifically conserved and differ in R- and S-selective enzymes. © 2016 Wiley Periodicals, Inc.

  1. Chaotic motif sampler: detecting motifs from biological sequences by using chaotic neurodynamics

    Science.gov (United States)

    Matsuura, Takafumi; Ikeguchi, Tohru

    Identification of a region in biological sequences, motif extraction problem (MEP) is solved in bioinformatics. However, the MEP is an NP-hard problem. Therefore, it is almost impossible to obtain an optimal solution within a reasonable time frame. To find near optimal solutions for NP-hard combinatorial optimization problems such as traveling salesman problems, quadratic assignment problems, and vehicle routing problems, chaotic search, which is one of the deterministic approaches, has been proposed and exhibits better performance than stochastic approaches. In this paper, we propose a new alignment method that employs chaotic dynamics to solve the MEPs. It is called the Chaotic Motif Sampler. We show that the performance of the Chaotic Motif Sampler is considerably better than that of the conventional methods such as the Gibbs Site Sampler and the Neighborhood Optimization for Multiple Alignment Discovery.

  2. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    Science.gov (United States)

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  3. Characterization of the tandem CWCH2 sequence motif: a hallmark of inter-zinc finger interactions

    Directory of Open Access Journals (Sweden)

    Aruga Jun

    2010-02-01

    Full Text Available Abstract Background The C2H2 zinc finger (ZF domain is widely conserved among eukaryotic proteins. In Zic/Gli/Zap1 C2H2 ZF proteins, the two N-terminal ZFs form a single structural unit by sharing a hydrophobic core. This structural unit defines a new motif comprised of two tryptophan side chains at the center of the hydrophobic core. Because each tryptophan residue is located between the two cysteine residues of the C2H2 motif, we have named this structure the tandem CWCH2 (tCWCH2 motif. Results Here, we characterized 587 tCWCH2-containing genes using data derived from public databases. We categorized genes into 11 classes including Zic/Gli/Glis, Arid2/Rsc9, PacC, Mizf, Aebp2, Zap1/ZafA, Fungl, Zfp106, Twincl, Clr1, and Fungl-4ZF, based on sequence similarity, domain organization, and functional similarities. tCWCH2 motifs are mostly found in organisms belonging to the Opisthokonta (metazoa, fungi, and choanoflagellates and Amoebozoa (amoeba, Dictyostelium discoideum. By comparison, the C2H2 ZF motif is distributed widely among the eukaryotes. The structure and organization of the tCWCH2 motif, its phylogenetic distribution, and molecular phylogenetic analysis suggest that prototypical tCWCH2 genes existed in the Opisthokonta ancestor. Within-group or between-group comparisons of the tCWCH2 amino acid sequence identified three additional sequence features (site-specific amino acid frequencies, longer linker sequence between two C2H2 ZFs, and frequent extra-sequences within C2H2 ZF motifs. Conclusion These features suggest that the tCWCH2 motif is a specialized motif involved in inter-zinc finger interactions.

  4. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  5. Evolutionary Analysis and Classification of OATs, OCTs, OCTNs, and Other SLC22 Transporters: Structure-Function Implications and Analysis of Sequence Motifs.

    Science.gov (United States)

    Zhu, Christopher; Nigam, Kabir B; Date, Rishabh C; Bush, Kevin T; Springer, Stevan A; Saier, Milton H; Wu, Wei; Nigam, Sanjay K

    2015-01-01

    The SLC22 family includes organic anion transporters (OATs), organic cation transporters (OCTs) and organic carnitine and zwitterion transporters (OCTNs). These are often referred to as drug transporters even though they interact with many endogenous metabolites and signaling molecules (Nigam, S.K., Nature Reviews Drug Discovery, 14:29-44, 2015). Phylogenetic analysis of SLC22 supports the view that these transporters may have evolved over 450 million years ago. Many OAT members were found to appear after a major expansion of the SLC22 family in mammals, suggesting a physiological and/or toxicological role during the mammalian radiation. Putative SLC22 orthologs exist in worms, sea urchins, flies, and ciona. At least six groups of SLC22 exist. OATs and OCTs form two Major clades of SLC22, within which (apart from Oat and Oct subclades), there are also clear Oat-like, Octn, and Oct-related subclades, as well as a distantly related group we term "Oat-related" (which may have different functions). Based on available data, it is arguable whether SLC22A18, which is related to bacterial drug-proton antiporters, should be assigned to SLC22. Disease-causing mutations, single nucleotide polymorphisms (SNPs) and other functionally analyzed mutations in OAT1, OAT3, URAT1, OCT1, OCT2, OCTN1, and OCTN2 map to the first extracellular domain, the large central intracellular domain, and transmembrane domains 9 and 10. These regions are highly conserved within subclades, but not between subclades, and may be necessary for SLC22 transporter function and functional diversification. Our results not only link function to evolutionarily conserved motifs but indicate the need for a revised sub-classification of SLC22.

  6. Finding Common Sequence and Structure Motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.

    1997-01-01

    We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences...

  7. Structural fragment clustering reveals novel structural and functional motifs in α-helical transmembrane proteins

    Directory of Open Access Journals (Sweden)

    Vassilev Boris

    2010-04-01

    Full Text Available Abstract Background A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology. Results We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to α-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94% appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1 a dimer interface motif found in voltage-gated chloride channels, (2 a proton transfer motif found in heme-copper oxidases, and (3 a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b. Conclusions Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

  8. Targeting functional motifs of a protein family

    Science.gov (United States)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  9. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  10. Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences.

    Science.gov (United States)

    Stepančič, Ziva

    2014-10-01

    Finding short patterns with residue variation in a set of sequences is still an open problem in genetics, since motif-finding techniques on DNA and protein sequences are inconclusive on real data sets and their performance varies on different species. Hence, finding new algorithms and evolving established methods are vital to further understanding of genome properties and the mechanisms of protein development. In this work, we present an approach to finding functional motifs in DNA sequences in connection to Gibbs sampling method. Starting points in the search space are partly determined via graphical representation of input sequences opposed to completely random initial points with the standard Gibbs sampling. Our algorithm is evaluated on synthetic as well as on real data sets by using several statistics, such as sensitivity, positive predictive value, specificity, performance, and correlation coefficient. Additionally, a comparison between our algorithm and the basic standard Gibbs sampling algorithm is made to show improvement in accuracy, repeatability, and performance.

  11. Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

    Science.gov (United States)

    Kinjo, Akira R.; Nakamura, Haruki

    2012-01-01

    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478

  12. Analysis of the Sequence and Phenotype of Drosophila Sex combs reduced Alleles Reveals Potential Functions of Conserved Protein Motifs of the Sex combs reduced Protein

    OpenAIRE

    Sivanantharajah, Lovesha; Percival-Smith, Anthony

    2009-01-01

    The Drosophila Hox gene, Sex combs reduced (Scr), is required for patterning the larval and adult, labial and prothoracic segments. Fifteen Scr alleles were sequenced and the phenotypes analyzed in detail. Six null alleles were nonsense mutations (Scr2, Scr4, Scr11, Scr13, Scr13A, and Scr16) and one was an intragenic deletion (Scr17). Five hypomorphic alleles were missense mutations (Scr1, Scr3, Scr5, Scr6, and Scr8) and one was a small protein deletion (Scr15). Protein sequence changes were ...

  13. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction.

    Directory of Open Access Journals (Sweden)

    Aalt D J van Dijk

    Full Text Available Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and

  14. Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder

    Science.gov (United States)

    Sharov, Alexei A.; Ko, Minoru S.H.

    2009-01-01

    We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cells based on published chromatin immunoprecipitation sequencing data. Furthermore, CisFinder successfully identified alternative binding motifs of TFs (e.g. POU5F1, ESRRB, and CTCF) and motifs for known and unknown co-factors of genes associated with the pluripotent state of ES cells. CisFinder also showed robust performance in the identification of motifs that were only slightly enriched in a set of DNA sequences. PMID:19740934

  15. An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences

    Institute of Scientific and Technical Information of China (English)

    Giulio Pavesi; Giancarlo Mauri; Graziano Pesole

    2004-01-01

    Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example, in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. Unfortunately, finding common structural elements in RNA molecules is a very challenging task, even if limited to secondary structure. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. Although they differ in some details, the approaches proposed so far are usually based on the preliminary alignment of the sequences and attempt to predict common structures (either local or global, or for some selected regions) for the aligned sequences. These methods give good results when sequence and structure similarity are very high, but function less well when similarity is limited to small and local elements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we present directly searches for regions of the sequences that can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, such as those forming stem-loop structures. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in

  16. HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

    Science.gov (United States)

    Le, Thanh; Altman, Tom; Gardiner, Katheleen

    2010-02-01

    Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

  17. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  18. Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder

    OpenAIRE

    Sharov, Alexei A; Minoru S.H. Ko

    2009-01-01

    We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cel...

  19. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  20. Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

    Science.gov (United States)

    Shan, Gao; Zheng, Wei-Mou

    2009-02-01

    By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.

  1. Functional importance of motif I of pseudouridine synthases: mutagenesis of aligned lysine and proline residues.

    Science.gov (United States)

    Spedaliere, C J; Hamilton, C S; Mueller, E G

    2000-08-01

    On the basis of sequence alignments, the pseudouridine synthases were grouped into four families that share no statistically significant global sequence similarity, though some common sequence motifs were discovered [Koonin, E. V. (1996) Nucleic Acids. Res. 24, 2411-2415; Gustafsson, C., Reid, R., Greene, P. J., and Santi, D. V. (1996) Nucleic Acids Res. 24, 3756-3762]. We have investigated the functional significance of these alignments by substituting the nearly invariant lysine and proline residues in Motif I of RluA and TruB, pseudouridine synthases belonging to different families. Contrary to our expectations, the altered enzymes display only very mild kinetic impairment. Substitution of the aligned lysine and proline residues does, however, reduce structural stability, consistent with a temperature sensitive phenotype that results from substitution of the cognate proline residue in Cbf5p, a yeast homologue of TruB [Zerbarjadian, Y., King, T., Fournier, M. J., Clarke, L., and Carbon, J. (1999) Mol. Cell. Biol. 19, 7461-7472]. Together, our data support a functional role for Motif I, as predicted by sequence alignments, though the effect of substituting the highly conserved residues was milder than we anticipated. By extrapolation, our findings also support the assignment of pseudouridine synthase function to certain physiologically important eukaryotic proteins that contain Motif I, including the human protein dyskerin, alteration of which leads to the disease dyskeratosis congenita.

  2. Mutational analysis of the SDD sequence motif of a PRRSV RNA-dependent RNA polymerase.

    Science.gov (United States)

    Zhou, Yan; Zheng, Haihong; Gao, Fei; Tian, Debin; Yuan, Shishan

    2011-09-01

    The subgenomic mRNA transcription and genomic replication of the porcine reproductive and respiratory syndrome virus (PRRSV) are directed by the viral replicase. The replicase is expressed in the form of two polyproteins and is subsequently processed into smaller nonstructural proteins (nsps). nsp9, containing the viral replicase, has characteristic sequence motifs conserved among the RNA-dependent RNA polymerases (RdRp) of positive-strand (PS) RNA viruses. To test whether the conserved SDD motif can tolerate other conserved motifs of RNA viruses and the influence of every residue on RdRp catalytic activity, many amino acids substitutions were introduced into it. Only one nsp9 substitution, of serine by glycine (S3050G), could rescue mutant viruses. The rescued virus was genetically stable. Alteration of either aspartate residue was not tolerated, destroyed the polymerase activity, and abolished virus transcription, but did not eliminate virus replication. We also found that the SDD motif was essentially invariant for the signature sequence of PRRSV RdRp. It could not accommodate other conserved motifs found in other RNA viral polymerases, except the GDD motif, which is conserved in all the other PS RNA viruses. These findings indicated that nidoviruses are evolutionarily related to other PS RNA viruses. Our studies support the idea that the two aspartate residues of the SDD motif are critical and essential for PRRSV transcription and represent a sequence variant of the GDD motif in PS RNA viruses.

  3. Importance of NPA motifs in the expression and function of water channel aquaporin-1

    Institute of Scientific and Technical Information of China (English)

    JIANG Yong; MA TongHui

    2007-01-01

    The asparagine-proline-alanine sequences (NPA motifs) are highly conserved in aquaporin water channel family. Crystallographic studies of AQP1 structure demonstrated that the two NPA motifs are in the narrow central constriction of the channel, serving to bind water molecules for selective and efficient water passage. To investigate the importance of the two NPA motifs in the structure, function and biogenesis of aquaporin water channels, we generated AQP1 mutations with NPA1 deletion, NPA2 deletion and NPA1,2 double deletion. The coding sequences of the three mutated cDNAs were subcloned into the mammalian expression vector pcDNA3.1 to form expression plasmids. We established stably transfected CHO cell lines expressing these AQP1 mutants. Immunofluorescence indicated that all the three mutated AQP1 proteins are expressed normally on the plasma membrane of stably transfected CHO cells, suggesting that deletion of NPA motifs does not influence the expression and intracellular processing of AQP1. Functional analysis demonstrated that NPA1 or NPA2 deletion reduced AQP1 water permeability by 49.6% and 46.7%, respectively, while NPA1,2 double deletion had little effect on AQP1 water permeability. These results provide evidence that NPA motifs are important for water per-meation but not essential for the expression, intracellular processing and the basic structure of AQP1 water channel.

  4. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  5. Triazine-Based Sequence-Defined Polymers with Side-Chain Diversity and Backbone-Backbone Interaction Motifs.

    Science.gov (United States)

    Grate, Jay W; Mo, Kai-For; Daily, Michael D

    2016-03-14

    Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions.

  6. Correlating novel variable and conserved motifs in the Hemagglutinin protein with significant biological functions

    Directory of Open Access Journals (Sweden)

    Werner Mark

    2008-08-01

    Full Text Available Abstract Background Variations in the influenza Hemagglutinin protein contributes to antigenic drift resulting in decreased efficiency of seasonal influenza vaccines and escape from host immune response. We performed an in silico study to determine characteristics of novel variable and conserved motifs in the Hemagglutinin protein from previously reported H3N2 strains isolated from Hong Kong from 1968–1999 to predict viral motifs involved in significant biological functions. Results 14 MEME blocks were generated and comparative analysis of the MEME blocks identified blocks 1, 2, 3 and 7 to correlate with several biological functions. Analysis of the different Hemagglutinin sequences elucidated that the single block 7 has the highest frequency of amino acid substitution and the highest number of co-mutating pairs. MEME 2 showed intermediate variability and MEME 1 was the most conserved. Interestingly, MEME blocks 2 and 7 had the highest incidence of potential post-translational modifications sites including phosphorylation sites, ASN glycosylation motifs and N-myristylation sites. Similarly, these 2 blocks overlap with previously identified antigenic sites and receptor binding sites. Conclusion Our study identifies motifs in the Hemagglutinin protein with different amino acid substitution frequencies over a 31 years period, and derives relevant functional characteristics by correlation of these motifs with potential post-translational modifications sites, antigenic and receptor binding sites.

  7. Correlating novel variable and conserved motifs in the Hemagglutinin protein with significant biological functions

    Science.gov (United States)

    Gendoo, Deena MA; El-Hefnawi, Mahmoud M; Werner, Mark; Siam, Rania

    2008-01-01

    Background Variations in the influenza Hemagglutinin protein contributes to antigenic drift resulting in decreased efficiency of seasonal influenza vaccines and escape from host immune response. We performed an in silico study to determine characteristics of novel variable and conserved motifs in the Hemagglutinin protein from previously reported H3N2 strains isolated from Hong Kong from 1968–1999 to predict viral motifs involved in significant biological functions. Results 14 MEME blocks were generated and comparative analysis of the MEME blocks identified blocks 1, 2, 3 and 7 to correlate with several biological functions. Analysis of the different Hemagglutinin sequences elucidated that the single block 7 has the highest frequency of amino acid substitution and the highest number of co-mutating pairs. MEME 2 showed intermediate variability and MEME 1 was the most conserved. Interestingly, MEME blocks 2 and 7 had the highest incidence of potential post-translational modifications sites including phosphorylation sites, ASN glycosylation motifs and N-myristylation sites. Similarly, these 2 blocks overlap with previously identified antigenic sites and receptor binding sites. Conclusion Our study identifies motifs in the Hemagglutinin protein with different amino acid substitution frequencies over a 31 years period, and derives relevant functional characteristics by correlation of these motifs with potential post-translational modifications sites, antigenic and receptor binding sites. PMID:18681973

  8. Systematic reconstruction of RNA functional motifs with high-throughput microfluidics.

    Science.gov (United States)

    Martin, Lance; Meier, Matthias; Lyons, Shawn M; Sit, Rene V; Marzluff, William F; Quake, Stephen R; Chang, Howard Y

    2012-12-01

    We present RNA-mechanically induced trapping of molecular interactions (RNA-MITOMI), a microfluidic platform that allows integrated synthesis and functional assays for programmable RNA libraries. The interaction of a comprehensive library of RNA mutants with stem-loop-binding protein precisely defined the RNA structural and sequence features that govern affinity. The functional motif reconstructed in a single experiment on our platform uncovers new binding specificities and enriches interpretation of phylogenetic data.

  9. The Human Pendrin Promoter Contains two N4 GAS Motifs with Different Functional Relevance

    Directory of Open Access Journals (Sweden)

    Simone Vanoni

    2013-12-01

    Full Text Available Background: Pendrin, an anion exchanger associated with the inner ear, thyroid and kidney, plays a significant role in respiratory tissues and diseases, where its expression is increased following IL-4 and IL-13 exposure. The mechanism leading to increased pendrin expression is in part due to binding of STAT6 to a consensus sequence (N4 GAS motif located in the pendrin promoter. As retrospective analyses of the 5' upstream sequence of the human pendrin promoter revealed an additional N4 GAS motif (1660 base pairs upstream of the one previously identified, we set out to define its contribution to IL-4 stimulated changes in pendrin promoter activity. Methods and Results: Electrophoretic mobility shift assays showed that STAT6 bound to oligonucleotides corresponding to both N4 GAS motifs in vitro, while dual luciferase promoter assays revealed that only one of the N4 GAS motifs was necessary for IL-4 -stimulated increases in pendrin promoter activity in living cells. We then examined the ability of STAT6 to bind each of the N4 GAS motifs in vivo with a site-specific ChIP assay, the results of which showed that STAT6 interacted with only the N4 GAS motif that was functionally implicated in increasing the activity of the pendrin promoter following IL-4 treatment. Conclusions: Of the two N4 GAS motifs located in the human pendrin promoter region analyzed in this study (nucleotides -3906 to +7, only the one located nearest to the first coding ATG participates in IL-4 stimulated increases in promoter activity.

  10. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  11. REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

    Directory of Open Access Journals (Sweden)

    Chong Chu

    Full Text Available Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.

  12. Discovering sequence motifs in quantitative and qualitative pepetide data

    DEFF Research Database (Denmark)

    Andreatta, Massimo

    -dimensional, as binding sites normally consist of a pocket or a groove on the protein surface. However, in many cases such interactions contain a linear component and can be more conveniently represented, or approximated, by a protein-peptide interaction. Whereas time-consuming structural studies are necessary in systems...... of interactions in a single experiment, with virtually unlimited choice of potential targets and variants of these targets. However, the amount and complexity of data produced by high-throughput techniques poses serious challenges to researchers of limited bioinformatics expertise who need to analyze...... with the presence of multiple motifs, due to the experimental setup or the actual poly-specificity of the receptor, in peptide data. A new algorithm, based on Gibbs sampling, identifies multiple specificities by performing two tasks simultaneously: alignment and clustering of peptide data. The method, available...

  13. LDSS-P: an advanced algorithm to extract functional short motifs associated with coordinated gene expression

    Science.gov (United States)

    Ichida, Hiroyuki; Long, Sharon R.

    2016-01-01

    Identifying functional elements in promoter sequences is a major goal in computational and experimental genome biology. Here, we describe an algorithm, Local Distribution of Short Sequences for Prokaryotes (LDSS-P), to identify conserved short motifs located at specific positions in the promoters of co-expressed prokaryotic genes. As a test case, we applied this algorithm to a symbiotic nitrogen-fixing bacterium, Sinorhizobium meliloti. The LDSS-P profiles that overlap with the 5′ section of the extracytoplasmic function RNA polymerase sigma factor RpoE2 consensus sequences displayed a sharp peak between -34 and -32 from TSS positions. The corresponding genes overlap significantly with RpoE2 targets identified from previous experiments. We further identified several groups of genes that are co-regulated with characterized marker genes. Our data indicate that in S. meliloti, and possibly in other Rhizobiaceae species, the master cell cycle regulator CtrA may recognize an expanded motif (AACCAT), which is positionally shifted from the previously reported CtrA consensus sequence in Caulobacter crescentus. Bacterial one-hybrid experiments showed that base substitution in the expanded motif either increase or decrease the binding by CtrA. These results show the effectiveness of LDSS-P as a method to delineate functional promoter elements. PMID:27190233

  14. Functional diversification of paralogous transcription factors via divergence in DNA binding site motif and in expression.

    Directory of Open Access Journals (Sweden)

    Larry N Singh

    Full Text Available BACKGROUND: Gene duplication is a major driver of evolutionary innovation as it allows for an organism to elaborate its existing biological functions via specialization or diversification of initially redundant gene paralogs. Gene function can diversify in several ways. Transcription factor gene paralogs in particular, can diversify either by changes in their tissue-specific expression pattern or by changes in the DNA binding site motif recognized by their protein product, which in turn alters their gene targets. The relationship between these two modes of functional diversification of transcription factor paralogs has not been previously investigated, and is essential for understanding adaptive evolution of transcription factor gene families. FINDINGS: Based on a large set of human paralogous transcription factor pairs, we show that when the DNA binding site motifs of transcription factor paralogs are similar, the expressions of the genes that encode the paralogs have diverged, so in general, at most one of the paralogs is highly expressed in a tissue. Moreover, paralogs with diverged DNA binding site motifs tend to be diverged in their function. Conversely, two paralogs that are highly expressed in a tissue tend to have dissimilar DNA binding site motifs. We have also found that in general, within a paralogous family, tissue-specific decrease in gene expression is more frequent than what is expected by chance. CONCLUSIONS: While previous investigations of paralogous gene diversification have only considered coding sequence divergence, by explicitly quantifying divergence in DNA binding site motif, our work presents a new paradigm for investigating functional diversification. Consistent with evolutionary expectation, our quantitative analysis suggests that paralogous transcription factors have survived extinction in part, either through diversification of their DNA binding site motifs or through alterations in their tissue-specific expression

  15. Nuclear Magnetic Resonance Structure of a Novel Globular Domain in RBM10 Containing OCRE, the Octamer Repeat Sequence Motif.

    Science.gov (United States)

    Martin, Bryan T; Serrano, Pedro; Geralt, Michael; Wüthrich, Kurt

    2016-01-01

    The OCtamer REpeat (OCRE) has been annotated as a 42-residue sequence motif with 12 tyrosine residues in the spliceosome trans-regulatory elements RBM5 and RBM10 (RBM [RNA-binding motif]), which are known to regulate alternative splicing of Fas and Bcl-x pre-mRNA transcripts. Nuclear magnetic resonance structure determination showed that the RBM10 OCRE sequence motif is part of a 55-residue globular domain containing 16 aromatic amino acids, which consists of an anti-parallel arrangement of six β strands, with the first five strands containing complete or incomplete Tyr triplets. This OCRE globular domain is a distinctive component of RBM10 and is more widely conserved in RBM10s across the animal kingdom than the ubiquitous RNA recognition components. It is also found in the functionally related RBM5. Thus, it appears that the three-dimensional structure of the globular OCRE domain, rather than the 42-residue OCRE sequence motif alone, confers specificity on RBM10 intermolecular interactions in the spliceosome.

  16. AptaTRACE Elucidates RNA Sequence-Structure Motifs from Selection Trends in HT-SELEX Experiments.

    Science.gov (United States)

    Dao, Phuong; Hoinka, Jan; Takahashi, Mayumi; Zhou, Jiehua; Ho, Michelle; Wang, Yijie; Costa, Fabrizio; Rossi, John J; Backofen, Rolf; Burnett, John; Przytycka, Teresa M

    2016-07-01

    Aptamers, short RNA or DNA molecules that bind distinct targets with high affinity and specificity, can be identified using high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX), but scalable analytic tools for understanding sequence-function relationships from diverse HT-SELEX data are not available. Here we present AptaTRACE, a computational approach that leverages the experimental design of the HT-SELEX protocol, RNA secondary structure, and the potential presence of many secondary motifs to identify sequence-structure motifs that show a signature of selection. We apply AptaTRACE to identify nine motifs in C-C chemokine receptor type 7 targeted by aptamers in an in vitro cell-SELEX experiment. We experimentally validate two aptamers whose binding required both sequence and structural features. AptaTRACE can identify low-abundance motifs, and we show through simulations that, because of this, it could lower HT-SELEX cost and time by reducing the number of selection cycles required. Published by Elsevier Inc.

  17. Conserved sequence motifs in the small subunit of human general transcription factor TFIIE.

    Science.gov (United States)

    Sumimoto, H; Ohkuma, Y; Sinn, E; Kato, H; Shimasaki, S; Horikoshi, M; Roeder, R G

    1991-12-05

    A general initiation factor, TFIIE, is essential for transcription initiation by RNA polymerase II in conjunction with other general factors. TFIIE is a heterotetramer containing two subunits of relative molecular mass 57,000 (TFIIE-alpha) and two of 34,000 (TFIIE-beta). TFIIE-beta is required in conjunction with TFIIE-alpha for transcription initiation. Here we report the cloning and expression of a complementary DNA encoding a functional human TFIIE-beta. Recombinant TFIIE-beta could replace the natural TFIIE-beta for transcription in conjunction with TFIIE-alpha. Amino-acid sequence comparisons reveal regions with sequence similarities to: subregion 3 of bacterial sigma factors; a region of RAP30 (the small subunit of TFIIF) with sequence similarity to a sigma-factor subregion implicated in binding to RNA polymerase; and a portion of the basic region-helix-loop-helix motif found in several enhancer-binding proteins. These potential homologies have implications for the role of TFIIE in preinitiation complex assembly and function.

  18. Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

    Science.gov (United States)

    Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

    2017-02-01

    An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Prediction of HIV-1 virus-host protein interactions using virus and host sequence motifs

    Directory of Open Access Journals (Sweden)

    Tozeren Aydin

    2009-05-01

    Full Text Available Abstract Background Host protein-protein interaction networks are altered by invading virus proteins, which create new interactions, and modify or destroy others. The resulting network topology favors excessive amounts of virus production in a stressed host cell network. Short linear peptide motifs common to both virus and host provide the basis for host network modification. Methods We focused our host-pathogen study on the binding and competing interactions of HIV-1 and human proteins. We showed that peptide motifs conserved across 70% of HIV-1 subtype B and C samples occurred in similar positions on HIV-1 proteins, and we documented protein domains that interact with these conserved motifs. We predicted which human proteins may be targeted by HIV-1 by taking pairs of human proteins that may interact via a motif conserved in HIV-1 and the corresponding interacting protein domain. Results Our predictions were enriched with host proteins known to interact with HIV-1 proteins ENV, NEF, and TAT (p-value Conclusion A list of host proteins highly enriched with those targeted by HIV-1 proteins can be obtained by searching for host protein motifs along virus protein sequences. The resulting set of host proteins predicted to be targeted by virus proteins will become more accurate with better annotations of motifs and domains. Nevertheless, our study validates the role of linear binding motifs shared by virus and host proteins as an important part of the crosstalk between virus and host.

  20. Functional protein clusters and regulatory motifs in Hypsibius dujardini und Milnesium tardigradum

    OpenAIRE

    Förster, Frank; Liang, Chunguang; Beisser, Daniela; Frohme, Marcus; Schill, Ralph O.; Dandekar, Thomas

    2009-01-01

    Functional protein clusters and regulatory motifs do not only mediate the unique adaptation of tardigrades against extreme temperature and other harsh environmental conditions, but are important markers to distinguish species and taxonomic units. We show here in detail results of our current comparison between Hypsibius dujardini and Milnesium tardigradum. We found 50 different clusters of sequence similar proteins between both tardigrades of which 10 are tardigrade specific. Proteins of othe...

  1. Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

    Science.gov (United States)

    Velagapudi, Sai Pradeep; Disney, Matthew D

    2013-10-15

    RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site.

  2. APTANI: a computational tool to select aptamers through sequence-structure motif analysis of HT-SELEX data.

    Science.gov (United States)

    Caroli, J; Taccioli, C; De La Fuente, A; Serafini, P; Bicciato, S

    2016-01-15

    Aptamers are synthetic nucleic acid molecules that can bind biological targets in virtue of both their sequence and three-dimensional structure. Aptamers are selected using SELEX, Systematic Evolution of Ligands by EXponential enrichment, a technique that exploits aptamer-target binding affinity. The SELEX procedure, coupled with high-throughput sequencing (HT-SELEX), creates billions of random sequences capable of binding different epitopes on specific targets. Since this technique produces enormous amounts of data, computational analysis represents a critical step to screen and select the most biologically relevant sequences. Here, we present APTANI, a computational tool to identify target-specific aptamers from HT-SELEX data and secondary structure information. APTANI builds on AptaMotif algorithm, originally implemented to analyze SELEX data; extends the applicability of AptaMotif to HT-SELEX data and introduces new functionalities, as the possibility to identify binding motifs, to cluster aptamer families or to compare output results from different HT-SELEX cycles. Tabular and graphical representations facilitate the downstream biological interpretation of results. APTANI is available at http://aptani.unimore.it. silvio.bicciato@unimore.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. A conserved sequence extending motif III of the motor domain in the Snf2-family DNA translocase Rad54 is critical for ATPase activity.

    Directory of Open Access Journals (Sweden)

    Xiao-Ping Zhang

    Full Text Available Rad54 is a dsDNA-dependent ATPase that translocates on duplex DNA. Its ATPase function is essential for homologous recombination, a pathway critical for meiotic chromosome segregation, repair of complex DNA damage, and recovery of stalled or broken replication forks. In recombination, Rad54 cooperates with Rad51 protein and is required to dissociate Rad51 from heteroduplex DNA to allow access by DNA polymerases for recombination-associated DNA synthesis. Sequence analysis revealed that Rad54 contains a perfect match to the consensus PIP box sequence, a widely spread PCNA interaction motif. Indeed, Rad54 interacts directly with PCNA, but this interaction is not mediated by the Rad54 PIP box-like sequence. This sequence is located as an extension of motif III of the Rad54 motor domain and is essential for full Rad54 ATPase activity. Mutations in this motif render Rad54 non-functional in vivo and severely compromise its activities in vitro. Further analysis demonstrated that such mutations affect dsDNA binding, consistent with the location of this sequence motif on the surface of the cleft formed by two RecA-like domains, which likely forms the dsDNA binding site of Rad54. Our study identified a novel sequence motif critical for Rad54 function and showed that even perfect matches to the PIP box consensus may not necessarily identify PCNA interaction sites.

  4. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal Matoq Saeed

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  5. iTriplet, a rule-based nucleic acid sequence motif finder

    Directory of Open Access Journals (Sweden)

    Gunderson Samuel I

    2009-10-01

    Full Text Available Abstract Background With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. Results We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. Conclusion iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.

  6. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    Science.gov (United States)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  7. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  8. Analysis of the Campylobacter jejuni genome by SMRT DNA sequencing identifies restriction-modification motifs.

    Directory of Open Access Journals (Sweden)

    Jason L O'Loughlin

    Full Text Available Campylobacter jejuni is a leading bacterial cause of human gastroenteritis. The goal of this study was to analyze the C. jejuni F38011 strain, recovered from an individual with severe enteritis, at a genomic and proteomic level to gain insight into microbial processes. The C. jejuni F38011 genome is comprised of 1,691,939 bp, with a mol.% (G+C content of 30.5%. PacBio sequencing coupled with REBASE analysis was used to predict C. jejuni F38011 genomic sites and enzymes that may be involved in DNA restriction-modification. A total of five putative methylation motifs were identified as well as the C. jejuni enzymes that could be responsible for the modifications. Peptides corresponding to the deduced amino acid sequence of the C. jejuni enzymes were identified using proteomics. This work sets the stage for studies to dissect the precise functions of the C. jejuni putative restriction-modification enzymes. Taken together, the data generated in this study contributes to our knowledge of the genomic content, methylation profile, and encoding capacity of C. jejuni.

  9. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

    Science.gov (United States)

    Zhao, Xiaoyan; Sze, Sing-Hoi

    2011-05-01

    One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.

  10. Sequence, structure, and cooperativity in folding of elementary protein structural motifs.

    Science.gov (United States)

    Lai, Jason K; Kubelka, Ginka S; Kubelka, Jan

    2015-08-11

    Residue-level unfolding of two helix-turn-helix proteins--one naturally occurring and one de novo designed--is reconstructed from multiple sets of site-specific (13)C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa-Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako-Saitô-Muñoz-Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and (13)C-amide I' bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for "experimental" reaction coordinates--namely, the degree of local folding as sensed by site-specific (13)C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture.

  11. A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction.

    Science.gov (United States)

    Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R; Gottschalk, Laura B; Lopez, Andrea P; Pellicore, Matthew J; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh; Cutting, Garry R

    2016-12-01

    The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, Kd = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence (1417)EENKVR(1422) and the terminal (1478)TRL(1480) (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics.

  12. Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies.

    Science.gov (United States)

    May, Alex C W

    2002-12-01

    It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered.

  13. Viroids: from genotype to phenotype just relying on RNA sequence and structural motifs

    Directory of Open Access Journals (Sweden)

    Ricardo eFlores

    2012-06-01

    Full Text Available As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson-Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunvioidae adopt multibranched conformations occasionally stabilized by kissing loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunvioidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures ⎯either global or local ⎯ determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs.

  14. The neuronal nitric oxide synthase PDZ motif binds to -G(D,E)XV* carboxyterminal sequences

    NARCIS (Netherlands)

    Schepens, J.; Cuppen, E.; Wieringa, B.; Hendriks, W.

    1997-01-01

    PDZ motifs are small protein-protein interaction modules that are thought to play a role in the clustering of submembranous signalling molecules. The specificity and functional consequences of their associative actions is still largely unknown. Using two-hybrid methodology we here demonstrate that t

  15. Bacteria-mimicking nanoparticle surface functionalization with targeting motifs

    Science.gov (United States)

    Lai, Mei-Hsiu; Clay, Nicholas E.; Kim, Dong Hyun; Kong, Hyunjoon

    2015-04-01

    In recent years, surface modification of nanocarriers with targeting motifs has been explored to modulate delivery of various diagnostic, sensing and therapeutic molecular cargo to desired sites of interest in in vitro bioengineering platforms and in vivo pathologic tissue. However, most surface functionalization approaches are often plagued by complex chemical modifications and effortful purifications. To resolve such challenges, this study demonstrates a unique method to immobilize antibodies that can act as targeting motifs on the surfaces of nanocarriers, inspired by a process that bacteria use for immobilization of the host's antibodies. We hypothesized that alkylated Staphylococcus aureus protein A (SpA) would self-assemble with micelles and subsequently induce stable coupling of antibodies to the micelles. We examined this hypothesis by using poly(2-hydroxyethyl-co-octadecyl aspartamide) (PHEA-g-C18) as a model polymer to form micelles. The self-assembly between the micelles and alkylated SpA became more thermodynamically favorable by increasing the degree of substitution of octadecyl chains to PHEA-g-C18, due to a positive entropy change. Lastly, the mixing of SpA-PA-coupled micelles with antibodies resulted in the coating of micelles with antibodies, as confirmed with a fluorescence resonance energy transfer (FRET) assay. The micelles coated with antibodies to VCAM-1 or integrin αv displayed a higher binding affinity to substrates coated with VCAM-1 and integrin αvβ3, respectively, than other controls, as evaluated with surface plasmon resonance (SPR) spectroscopy and a circulation-simulating flow chamber. We envisage that this bacteria-inspired protein immobilization approach will be useful to improve the quality of targeted delivery of nanoparticles, and can be extended to modify the surface of a wide array of nanocarriers.In recent years, surface modification of nanocarriers with targeting motifs has been explored to modulate delivery of various

  16. A leucine zipper motif determines different functions in a DNA replication protein.

    Science.gov (United States)

    Garcia de Viedma, D; Giraldo, R; Rivas, G; Fernández-Tresguerres, E; Diaz-Orejas, R

    1996-01-01

    RepA is the replication initiator protein of the Pseudomonas plasmid pPS10 and is also able to autoregulate its own synthesis. Here we report a genetic and functional analysis of a leucine zipper-like (LZ) motif located at the N-terminus of RepA. It is shown that the LZ motif modulates the equilibrium between monomeric and dimeric forms of the protein and that monomers of RepA interact with sequences at the origin of replication, oriV, while dimers are required for interactions of RepA at the repA promoter. Further, different residues of the LZ motif are seen to have different functional roles. Leucines at the d positions of the putative alpha-helix are relevant in the formation of RepA dimers required for transcriptional autoregulation. They also modulate other RepA-RepA interactions that result in cooperative binding of protein monomers to the origin of replication. The residues at the b/f positions of the putative helix play no relevant role in RepA-RepA interactions. These residues do not affect RepA autoregulation but do influence replication, as demonstrated by mutants that, without affecting binding to oriV, either increase the host range of the plasmid or are inactive in replication. It is proposed that residues in b/f positions play a relevant role in interactions between RepA and host replication factors. Images PMID:8631313

  17. MicroRNA sequence motifs reveal asymmetry between the stem arms

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Havgaard, Jakob Hull; Ensterö, M.

    2006-01-01

    RNAs in their genomic contexts. We have compared profiles of mature miRNAs within their genomic context of the 5' and 3' stemloop precursor arms and we find asymmetry between mature sequences of the 5' and 3' stemloop precursor arms. The main observation is that vertebrate organisms have a characteristic motif on the 5......' arm which is in contrast to the 3' arm motif which mainly show the conserved U at the position of the mature start. Also the vertebrate 5' arm motif show a semi-conserved G 13 nucleotides upstream from the first position. We compared the 5' and 3' arm profiles using the average log likelihood ratio...... (ALLR) score, as defined by Wang and Stormo (2003) [Wang T., Stormo, G.D., 2003. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2369-2380.] and computing a p-value we find that the two profiles differs significantly in their 3' end where the 5' arm...

  18. Positive evolutionary selection of an HD motif on Alzheimer precursor protein orthologues suggests a functional role.

    Directory of Open Access Journals (Sweden)

    István Miklós

    2012-02-01

    Full Text Available HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ and on the carboxy-terminal region of the extracellular domain (CAED of the human amyloid precursor protein (APP and a taxonomically well defined group of APP orthologues (APPOs. In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001. The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the "transcription binding site turnover." CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1 and Amyloid-like protein 2 (APLP2. Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N and English (H6R mutations in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs.

  19. Sequence determination and modeling of structural motifs for the smallest monomeric aminoacyl-tRNA synthetase.

    OpenAIRE

    Hou, Y M; Shiba, K; Mottes, C; Schimmel, P.

    1991-01-01

    Polypeptide chains of 19 previously studied Escherichia coli aminoacyl-tRNA synthetases are as large as 951 amino acids and, depending on the enzyme, have quaternary structures of alpha, alpha 2, alpha 2 beta 2, and alpha 4. These enzymes have been organized into two classes which are defined by sequence motifs that are associated with specific three-dimensional structures. We isolated, cloned, and sequenced the previously uncharacterized gene for E. coli cysteine-tRNA synthetase (EC 6.1.1.16...

  20. De novo computational identification of stress-related sequence motifs and microRNA target sites in untranslated regions of a plant translatome

    Science.gov (United States)

    Munusamy, Prabhakaran; Zolotarov, Yevgen; Meteignier, Louis-Valentin; Moffett, Peter; Strömvik, Martina V.

    2017-01-01

    Gene regulation at the transcriptional and translational level leads to diversity in phenotypes and function in organisms. Regulatory DNA or RNA sequence motifs adjacent to the gene coding sequence act as binding sites for proteins that in turn enable or disable expression of the gene. Whereas the known DNA and RNA binding proteins range in the thousands, only a few motifs have been examined. In this study, we have predicted putative regulatory motifs in groups of untranslated regions from genes regulated at the translational level in Arabidopsis thaliana under normal and stressed conditions. The test group of sequences was divided into random subgroups and subjected to three de novo motif finding algorithms (Seeder, Weeder and MEME). In addition to identifying sequence motifs, using an in silico tool we have predicted microRNA target sites in the 3′ UTRs of the translationally regulated genes, as well as identified upstream open reading frames located in the 5′ UTRs. Our bioinformatics strategy and the knowledge generated contribute to understanding gene regulation during stress, and can be applied to disease and stress resistant plant development. PMID:28276452

  1. Sequence motifs associated with hepatotoxicity of locked nucleic acid--modified antisense oligonucleotides.

    Science.gov (United States)

    Burdick, Andrew D; Sciabola, Simone; Mantena, Srinivasa R; Hollingshead, Brett D; Stanton, Robert; Warneke, James A; Zeng, Ming; Martsen, Elena; Medvedev, Alexander; Makarov, Sergei S; Reed, Lori A; Davis, John W; Whiteley, Laurence O

    2014-04-01

    Fully phosphorothioate antisense oligonucleotides (ASOs) with locked nucleic acids (LNAs) improve target affinity, RNase H activation and stability. LNA modified ASOs can cause hepatotoxicity, and this risk is currently not fully understood. In vitro cytotoxicity screens have not been reliable predictors of hepatic toxicity in non-clinical testing; however, mice are considered to be a sensitive test species. To better understand the relationship between nucleotide sequence and hepatotoxicity, a structure-toxicity analysis was performed using results from 2 week repeated-dose-tolerability studies in mice administered LNA-modified ASOs. ASOs targeting human Apolipoprotien C3 (Apoc3), CREB (cAMP Response Element Binding Protein) Regulated Transcription Coactivator 2 (Crtc2) or Glucocorticoid Receptor (GR, NR3C1) were classified based upon the presence or absence of hepatotoxicity in mice. From these data, a random-decision forest-classification model generated from nucleotide sequence descriptors identified two trinucleotide motifs (TCC and TGC) that were present only in hepatotoxic sequences. We found that motif containing sequences were more likely to bind to hepatocellular proteins in vitro and increased P53 and NRF2 stress pathway activity in vivo. These results suggest in silico approaches can be utilized to establish structure-toxicity relationships of LNA-modified ASOs and decrease the likelihood of hepatotoxicity in preclinical testing.

  2. Sequence motifs associated with hepatotoxicity of locked nucleic acid—modified antisense oligonucleotides

    Science.gov (United States)

    Burdick, Andrew D.; Sciabola, Simone; Mantena, Srinivasa R.; Hollingshead, Brett D.; Stanton, Robert; Warneke, James A.; Zeng, Ming; Martsen, Elena; Medvedev, Alexander; Makarov, Sergei S.; Reed, Lori A.; Davis, John W.; Whiteley, Laurence O.

    2014-01-01

    Fully phosphorothioate antisense oligonucleotides (ASOs) with locked nucleic acids (LNAs) improve target affinity, RNase H activation and stability. LNA modified ASOs can cause hepatotoxicity, and this risk is currently not fully understood. In vitro cytotoxicity screens have not been reliable predictors of hepatic toxicity in non-clinical testing; however, mice are considered to be a sensitive test species. To better understand the relationship between nucleotide sequence and hepatotoxicity, a structure–toxicity analysis was performed using results from 2 week repeated-dose-tolerability studies in mice administered LNA-modified ASOs. ASOs targeting human Apolipoprotien C3 (Apoc3), CREB (cAMP Response Element Binding Protein) Regulated Transcription Coactivator 2 (Crtc2) or Glucocorticoid Receptor (GR, NR3C1) were classified based upon the presence or absence of hepatotoxicity in mice. From these data, a random-decision forest-classification model generated from nucleotide sequence descriptors identified two trinucleotide motifs (TCC and TGC) that were present only in hepatotoxic sequences. We found that motif containing sequences were more likely to bind to hepatocellular proteins in vitro and increased P53 and NRF2 stress pathway activity in vivo. These results suggest in silico approaches can be utilized to establish structure–toxicity relationships of LNA-modified ASOs and decrease the likelihood of hepatotoxicity in preclinical testing. PMID:24550163

  3. angaGEDUCI: Anopheles gambiae gene expression database with integrated comparative algorithms for identifying conserved DNA motifs in promoter sequences

    Directory of Open Access Journals (Sweden)

    Ribeiro Jose Marcos C

    2006-05-01

    Full Text Available Abstract Background The completed sequence of the Anopheles gambiae genome has enabled genome-wide analyses of gene expression and regulation in this principal vector of human malaria. These investigations have created a demand for efficient methods of cataloguing and analyzing the large quantities of data that have been produced. The organization of genome-wide data into one unified database makes possible the efficient identification of spatial and temporal patterns of gene expression, and by pairing these findings with comparative algorithms, may offer a tool to gain insight into the molecular mechanisms that regulate these expression patterns. Description We provide a publicly-accessible database and integrated data-mining tool, angaGEDUCI, that unifies 1 stage- and tissue-specific microarray analyses of gene expression in An. gambiae at different developmental stages and temporal separations following a bloodmeal, 2 functional gene annotation, 3 genomic sequence data, and 4 promoter sequence comparison algorithms. The database can be used to study genes expressed in particular stages, tissues, and patterns of interest, and to identify conserved promoter sequence motifs that may play a role in the regulation of such expression. The database is accessible from the address http://www.angaged.bio.uci.edu. Conclusion By combining gene expression, function, and sequence data with integrated sequence comparison algorithms, angaGEDUCI streamlines spatial and temporal pattern-finding and produces a straightforward means of developing predictions and designing experiments to assess how gene expression may be controlled at the molecular level.

  4. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

    Science.gov (United States)

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-01-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277

  5. Identification of sequence motifs involved in Dengue virus-host interactions.

    Science.gov (United States)

    Asnet Mary, J; Paramasivan, R; Shenbagarathai, R

    2016-01-01

    Dengue fever is a rapidly spreading mosquito-borne virus infection, which remains a serious global public health problem. As there is no specific treatment or commercial vaccine available for effective control of the disease, the attempts on developing novel control strategies are underway. Viruses utilize the surface receptor proteins of host to enter into the cells. Though various proteins were said to be receptors of Dengue virus (DENV) using Virus Overlay Protein Binding Assay, the precise interaction between DENV and host is not explored. Understanding the structural features of domain III envelope glycoprotein would help in developing efficient antiviral inhibitors. Therefore, an attempt was made to identify the sequence motifs present in domain III envelope glycoprotein of Dengue virus. Computational analysis revealed that the NGR motif is present in the domain III envelope glycoprotein of DENV-1 and DENV-3. Similarly, DENV-1, DENV-2 and DENV-4 were found to contain Yxxphi motif which is a tyrosine-based sorting signal responsible for the interaction with a mu subunit of adaptor protein complex. High-throughput virtual screening resulted in five compounds as lead molecules based on glide score, which ranges from -4.664 to -6.52 kcal/Mol. This computational prediction provides an additional tool for understanding the virus-host interactions and helps to identify potential targets in the host. Further, experimental evidence is warranted to confirm the virus-host interactions and also inhibitory activity of reported lead compounds.

  6. Sequence-dependent stability test of a left-handed β-helix motif.

    Science.gov (United States)

    Hayre, Natha R; Singh, Rajiv R P; Cox, Daniel L

    2012-03-21

    The left-handed β-helix (LHBH) is an intriguing, rare structural pattern in polypeptides that has been implicated in the formation of amyloid aggregates. We used accurate all-atom replica-exchange molecular dynamics (REMD) simulations to study the relative stability of diverse sequences in the LHBH conformation. Ensemble-average coordinates from REMD served as a scoring criterion to identify sequences and threadings optimally suited to the LHBH, as in a fold recognition paradigm. We examined the repeatability of our REMD simulations, finding that single simulations can be reliable to a quantifiable extent. We find expected behavior for the positive and negative control cases of a native LHBH and intrinsically disordered sequences, respectively. Polyglutamine and a designed hexapeptide repeat show remarkable affinity for the LHBH motif. A structural model for misfolded murine prion protein was also considered, and showed intermediate stability under the given conditions. Our technique is found to be an effective probe of LHBH stability, and promises to be scalable to broader studies of this and potentially other novel or rare motifs. The superstable character of the designed hexapeptide repeat suggests theoretical and experimental follow-ups.

  7. Identification of novel conserved functional motifs across most Influenza A viral strains

    Directory of Open Access Journals (Sweden)

    El-Azab Iman

    2011-01-01

    Full Text Available Abstract Background Influenza A virus poses a continuous threat to global public health. Design of novel universal drugs and vaccine requires a careful analysis of different strains of Influenza A viral genome from diverse hosts and subtypes. We performed a systematic in silico analysis of Influenza A viral segments of all available Influenza A viral strains and subtypes and grouped them based on host, subtype, and years isolated, and through multiple sequence alignments we extrapolated conserved regions, motifs, and accessible regions for functional mapping and annotation. Results Across all species and strains 87 highly conserved regions (conservation percentage > = 90% and 19 functional motifs (conservation percentage = 100% were found in PB2, PB1, PA, NP, M, and NS segments. The conservation percentage of these segments ranged between 94 - 98% in human strains (the most conserved, 85 - 93% in swine strains (the most variable, and 91 - 94% in avian strains. The most conserved segment was different in each host (PB1 for human strains, NS for avian strains, and M for swine strains. Target accessibility prediction yielded 324 accessible regions, with a single stranded probability > 0.5, of which 78 coincided with conserved regions. Some of the interesting annotations in these regions included sites for protein-protein interactions, the RNA binding groove, and the proton ion channel. Conclusions The influenza virus has evolved to adapt to its host through variations in the GC content and conservation percentage of the conserved regions. Nineteen universal conserved functional motifs were discovered, of which some were accessible regions with interesting biological functions. These regions will serve as a foundation for universal drug targets as well as universal vaccine design.

  8. Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction

    NARCIS (Netherlands)

    Dijk, van A.D.J.; Morabito, G.; Fiers, M.A.; Ham, van R.C.H.J.; Angenent, G.C.; Immink, R.G.H.

    2010-01-01

    Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein famil

  9. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons, The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS......, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed, Example solutions, and comparisons with other...

  10. A tandem sequence motif acts as a distance-dependent enhancer in a set of genes involved in translation by binding the proteins NonO and SFPQ

    Directory of Open Access Journals (Sweden)

    Roepcke Stefan

    2011-12-01

    Full Text Available Abstract Background Bioinformatic analyses of expression control sequences in promoters of co-expressed or functionally related genes enable the discovery of common regulatory sequence motifs that might be involved in co-ordinated gene expression. By studying promoter sequences of the human ribosomal protein genes we recently identified a novel highly specific Localized Tandem Sequence Motif (LTSM. In this work we sought to identify additional genes and LTSM-binding proteins to elucidate potential regulatory mechanisms. Results Genome-wide analyses allowed finding a considerable number of additional LTSM-positive genes, the products of which are involved in translation, among them, translation initiation and elongation factors, and 5S rRNA. Electromobility shift assays then showed specific signals demonstrating the binding of protein complexes to LTSM in ribosomal protein gene promoters. Pull-down assays with LTSM-containing oligonucleotides and subsequent mass spectrometric analysis identified the related multifunctional nucleotide binding proteins NonO and SFPQ in the binding complex. Functional characterization then revealed that LTSM enhances the transcriptional activity of the promoters in dependency of the distance from the transcription start site. Conclusions Our data demonstrate the power of bioinformatic analyses for the identification of biologically relevant sequence motifs. LTSM and the here found LTSM-binding proteins NonO and SFPQ were discovered through a synergistic combination of bioinformatic and biochemical methods and are regulators of the expression of a set of genes of the translational apparatus in a distance-dependent manner.

  11. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    Science.gov (United States)

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  12. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

    Science.gov (United States)

    Gautheret, D; Lambert, A

    2001-11-09

    We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs. Copyright 2001 Academic Press.

  13. Novel Structural and Functional Motifs in cellulose synthase (CesA Genes of Bread Wheat (Triticum aestivum, L..

    Directory of Open Access Journals (Sweden)

    Simerjeet Kaur

    Full Text Available Cellulose is the primary determinant of mechanical strength in plant tissues. Late-season lodging is inversely related to the amount of cellulose in a unit length of the stem. Wheat is the most widely grown of all the crops globally, yet information on its CesA gene family is limited. We have identified 22 CesA genes from bread wheat, which include homoeologs from each of the three genomes, and named them as TaCesAXA, TaCesAXB or TaCesAXD, where X denotes the gene number and the last suffix stands for the respective genome. Sequence analyses of the CESA proteins from wheat and their orthologs from barley, maize, rice, and several dicot species (Arabidopsis, beet, cotton, poplar, potato, rose gum and soybean revealed motifs unique to monocots (Poales or dicots. Novel structural motifs CQIC and SVICEXWFA were identified, which distinguished the CESAs involved in the formation of primary and secondary cell wall (PCW and SCW in all the species. We also identified several new motifs specific to monocots or dicots. The conserved motifs identified in this study possibly play functional roles specific to PCW or SCW formation. The new insights from this study advance our knowledge about the structure, function and evolution of the CesA family in plants in general and wheat in particular. This information will be useful in improving culm strength to reduce lodging or alter wall composition to improve biofuel production.

  14. Novel Structural and Functional Motifs in cellulose synthase (CesA) Genes of Bread Wheat (Triticum aestivum, L.).

    Science.gov (United States)

    Kaur, Simerjeet; Dhugga, Kanwarpal S; Gill, Kulvinder; Singh, Jaswinder

    2016-01-01

    Cellulose is the primary determinant of mechanical strength in plant tissues. Late-season lodging is inversely related to the amount of cellulose in a unit length of the stem. Wheat is the most widely grown of all the crops globally, yet information on its CesA gene family is limited. We have identified 22 CesA genes from bread wheat, which include homoeologs from each of the three genomes, and named them as TaCesAXA, TaCesAXB or TaCesAXD, where X denotes the gene number and the last suffix stands for the respective genome. Sequence analyses of the CESA proteins from wheat and their orthologs from barley, maize, rice, and several dicot species (Arabidopsis, beet, cotton, poplar, potato, rose gum and soybean) revealed motifs unique to monocots (Poales) or dicots. Novel structural motifs CQIC and SVICEXWFA were identified, which distinguished the CESAs involved in the formation of primary and secondary cell wall (PCW and SCW) in all the species. We also identified several new motifs specific to monocots or dicots. The conserved motifs identified in this study possibly play functional roles specific to PCW or SCW formation. The new insights from this study advance our knowledge about the structure, function and evolution of the CesA family in plants in general and wheat in particular. This information will be useful in improving culm strength to reduce lodging or alter wall composition to improve biofuel production.

  15. Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs.

    Science.gov (United States)

    Bandyopadhyay, Deepak; Huan, Jun; Liu, Jinze; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

    2010-09-01

    We describe a new approach for inferring the functional relationships between nonhomologous protein families by looking at statistical enrichment of alternative function predictions in classification hierarchies such as Gene Ontology (GO) and Structural Classification of Proteins (SCOP). Protein structures are represented by robust graph representations, and the fast frequent subgraph mining algorithm is applied to protein families to generate sets of family-specific packing motifs, i.e., amino acid residue-packing patterns shared by most family members but infrequent in other proteins. The function of a protein is inferred by identifying in it motifs characteristic of a known family. We employ these family-specific motifs to elucidate functional relationships between families in the GO and SCOP hierarchies. Specifically, we postulate that two families are functionally related if one family is statistically enriched by motifs characteristic of another family, i.e., if the number of proteins in a family containing a motif from another family is greater than expected by chance. This function-inference method can help annotate proteins of unknown function, establish functional neighbors of existing families, and help specify alternate functions for known proteins.

  16. Correlating CpG islands, motifs, and sequence variants in human chromosome 21

    Directory of Open Access Journals (Sweden)

    Cercone Nick

    2011-07-01

    Full Text Available Abstract Background CpG islands are important regions in DNA. They usually appear at the 5’ end of genes containing GC-rich dinucleotides. When DNA methylation occurs, gene regulation is affected and it sometimes leads to carcinogenesis. We propose a new detection program using a hidden-markov model alongside the Viterbi algorithm. Methods Our solution provides a graphical user interface not seen in many of the other CGI detection programs and we unify the detection and analysis under one program to allow researchers to scan a genetic sequence, detect the significant CGIs, and analyze the sequence once the scan is complete for any noteworthy findings. Results Using human chromosome 21, we show that our algorithm finds a significant number of CGIs. Running an analysis on a dataset of promoters discovered that the characteristics of methylated and unmethylated CGIs are significantly different. Finally, we detected significantly different motifs between methylated and unmethylated CGI promoters using MEME and MAST. Conclusions Developing this new tool for the community using powerful algorithms has shown that combining analysis with CGI detection will improve the continued research within the field of epigenetics.

  17. Engineering Proteins with Enhanced Mechanical Stability by Force Specific Sequence Motifs

    Science.gov (United States)

    Lu, Wenzhe; Negi, Surendra; Oberhauser, Andres F.; Braun, Werner

    2012-01-01

    Use of atomic force microscopy (AFM) has recently led to a better understanding of the molecular mechanisms of the unfolding process by mechanical forces; however, the rational design of novel proteins with specific mechanical strength remains challenging. We have approached this problem from a new perspective that generates linear physical-chemical properties (PCP) motifs from a limited AFM data set. Guided by our linear sequence analysis we designed and analyzed four new mutants of the titin I1 domain with the goal of increasing the domain's mechanical strength. All four mutants could be cloned and expressed as soluble proteins. AFM data indicate that at least two of the mutants have increased molecular mechanical strength. This observation suggests that the PCP method is useful to graft sequences specific for high mechanical stability to weak proteins to increase their mechanical stability, and represents an additional tool in the design of novel proteins besides steered molecular dynamics calculations, coarse grained simulations and phi-value analysis of the transition state. PMID:22274941

  18. Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins.

    Directory of Open Access Journals (Sweden)

    David Karlin

    Full Text Available Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa, several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains that could be detected simply by comparing orthologous proteins.

  19. Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.

    Science.gov (United States)

    Tong, Hao; Schliekelman, Paul; Mrázek, Jan

    2017-01-05

    DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude

  20. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Directory of Open Access Journals (Sweden)

    Guido W. Grimm

    2006-01-01

    Full Text Available The multi-copy internal transcribed spacer (ITS region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation instead of the full (partly redundant original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

  1. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs

    DEFF Research Database (Denmark)

    van Beest, M; Dooijes, D; van De Wetering, M

    2000-01-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment...

  2. Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

    Directory of Open Access Journals (Sweden)

    William R. Gallaher

    2015-01-01

    Full Text Available Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP and the full length glycoprotein (GP, which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4 of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis.

  3. Identification of putative regulatory motifs in the upstream regions of co-expressed functional groups of genes in Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    Joshi NV

    2009-01-01

    Full Text Available Abstract Background Regulation of gene expression in Plasmodium falciparum (Pf remains poorly understood. While over half the genes are estimated to be regulated at the transcriptional level, few regulatory motifs and transcription regulators have been found. Results The study seeks to identify putative regulatory motifs in the upstream regions of 13 functional groups of genes expressed in the intraerythrocytic developmental cycle of Pf. Three motif-discovery programs were used for the purpose, and motifs were searched for only on the gene coding strand. Four motifs – the 'G-rich', the 'C-rich', the 'TGTG' and the 'CACA' motifs – were identified, and zero to all four of these occur in the 13 sets of upstream regions. The 'CACA motif' was absent in functional groups expressed during the ring to early trophozoite transition. For functional groups expressed in each transition, the motifs tended to be similar. Upstream motifs in some functional groups showed 'positional conservation' by occurring at similar positions relative to the translational start site (TLS; this increases their significance as regulatory motifs. In the ribonucleotide synthesis, mitochondrial, proteasome and organellar translation machinery genes, G-rich, C-rich, CACA and TGTG motifs, respectively, occur with striking positional conservation. In the organellar translation machinery group, G-rich motifs occur close to the TLS. The same motifs were sometimes identified for multiple functional groups; differences in location and abundance of the motifs appear to ensure different modes of action. Conclusion The identification of positionally conserved over-represented upstream motifs throws light on putative regulatory elements for transcription in Pf.

  4. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, Catherine [Noblis

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  5. Flow Cytometry-Assisted Cloning of Specific Sequence Motifs from Complex 16S rRNA Gene Libraries

    DEFF Research Database (Denmark)

    Nielsen, Jeppe Lund; Schramm, Andreas; Bernhard, Anne E.

    2004-01-01

      FLOW CYTOMETRY-ASSISTED CLONING OF SPECIFIC SEQUENCE MOTIFS FROM COMPLEX 16S RRNA GENE LIBRARIES Jeppe L. Nielsen,1 Andreas Schramm,1,2 Anne E. Bernhard,1 Gerrit J. van den Engh,3 and David A. Stahl1* Department of Civil and Environmental Engineering, University of Washington,1 and Institute...... for Systems Biology,3 Seattle, Washington, and Department of Ecological Microbiology, University of Bayreuth, Bayreuth, Germany2 A flow cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting......-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant in a clone library of environmental 16S rRNA genes.  ...

  6. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  7. Feedback through graph motifs relates structure and function in complex networks

    CERN Document Server

    Hu, Yu; Cain, Nicholas; Mihalas, Stefan; Kutz, J Nathan; Shea-Brown, Eric

    2016-01-01

    How does the connectivity of a network system combine with the behavior of its individual components to determine its collective function? We approach this question by relating the internal network feedback to the statistical prevalence of connectivity motifs, a set of surprisingly simple and local statistics on the network topology. The resulting motif description provides a reduced order model of the network input-output dynamics and it relates the overall network function to feedback control theory. For example, this new formulation dramatically simplifies the classic Erdos-Renyi graph, reducing the overall graph behavior to a simple proportional feedback wrapped around the dynamics of a single node. Higher-order motifs systematically provide further layers and types of feedback to regulate the network response. Thus, the local connectivity shapes temporal and spectral processing by the network as a whole, and we show how this enables robust, yet tunable, functionality such as extending the time constant w...

  8. Conserved Functional Motifs and Homology Modeling to Predict Hidden Moonlighting Functional Sites

    KAUST Repository

    Wong, Aloysius Tze

    2015-06-09

    Moonlighting functional centers within proteins can provide them with hitherto unrecognized functions. Here, we review how hidden moonlighting functional centers, which we define as binding sites that have catalytic activity or regulate protein function in a novel manner, can be identified using targeted bioinformatic searches. Functional motifs used in such searches include amino acid residues that are conserved across species and many of which have been assigned functional roles based on experimental evidence. Molecules that were identified in this manner seeking cyclic mononucleotide cyclases in plants are used as examples. The strength of this computational approach is enhanced when good homology models can be developed to test the functionality of the predicted centers in silico, which, in turn, increases confidence in the ability of the identified candidates to perform the predicted functions. Computational characterization of moonlighting functional centers is not diagnostic for catalysis but serves as a rapid screening method, and highlights testable targets from a potentially large pool of candidates for subsequent in vitro and in vivo experiments required to confirm the functionality of the predicted moonlighting centers.

  9. Conserved functional motifs and homology modelling to predict hidden moonlighting functional sites

    Directory of Open Access Journals (Sweden)

    Helen R Irving

    2015-06-01

    Full Text Available Moonlighting functional centers within proteins can provide them with hitherto unrecognized functions. Here we review how hidden moonlighting functional centers which we define as binding sites that have catalytic activity or regulate protein function in a novel manner, can be identified using targeted bioinformatic searches. Functional motifs used in such searches include amino acid residues that are conserved across species and many of which have been assigned functional roles based on experimental evidence. Molecules that were identified in this manner seeking cyclic mononucleotide cyclases in plants are used as examples. The strength of this computational approach is enhanced when good homology models can be developed to test the functionality of the predicted centers in silico which in turn, increases confidence in the ability of the identified candidates to perform the predicted functions. Computational characterization of moonlighting functional centers is not diagnostic for catalysis but serves as a rapid screening method, and highlights testable targets from a potentially large pool of candidates for subsequent in vitro and in vivo experiments required to confirm the functionality of the predicted moonlighting centers.

  10. DNA consensus sequence motif for binding response regulator PhoP, a virulence regulator of Mycobacterium tuberculosis.

    Science.gov (United States)

    He, Xiaoyuan; Wang, Shuishu

    2014-12-30

    Tuberculosis has reemerged as a serious threat to human health because of the increasing prevalence of drug-resistant strains and synergetic infection with HIV, prompting an urgent need for new and more efficient treatments. The PhoP-PhoR two-component system of Mycobacterium tuberculosis plays an important role in the virulence of the pathogen and thus represents a potential drug target. To study the mechanism of gene transcription regulation by response regulator PhoP, we identified a high-affinity DNA sequence for PhoP binding using systematic evolution of ligands by exponential enrichment. The sequence contains a direct repeat of two 7 bp motifs separated by a 4 bp spacer, TCACAGC(N4)TCACAGC. The specificity of the direct-repeat sequence for PhoP binding was confirmed by isothermal titration calorimetry and electrophoretic mobility shift assays. PhoP binds to the direct repeat as a dimer in a highly cooperative manner. We found many genes previously identified to be regulated by PhoP that contain the direct-repeat motif in their promoter sequences. Synthetic DNA fragments at the putative promoter-binding sites bind PhoP with variable affinity, which is related to the number of mismatches in the 7 bp motifs, the positions of the mismatches, and the spacer and flanking sequences. Phosphorylation of PhoP increases the affinity but does not change the specificity of DNA binding. Overall, our results confirm the direct-repeat sequence as the consensus motif for PhoP binding and thus pave the way for identification of PhoP directly regulated genes in different mycobacterial genomes.

  11. One motif to bind them: A small-XXX-small motif affects transmembrane domain 1 oligomerization, function, localization, and cross-talk between two yeast GPCRs.

    Science.gov (United States)

    Lock, Antonia; Forfar, Rachel; Weston, Cathryn; Bowsher, Leo; Upton, Graham J G; Reynolds, Christopher A; Ladds, Graham; Dixon, Ann M

    2014-12-01

    G protein-coupled receptors (GPCRs) are the largest family of cell-surface receptors in mammals and facilitate a range of physiological responses triggered by a variety of ligands. GPCRs were thought to function as monomers, however it is now accepted that GPCR homo- and hetero-oligomers also exist and influence receptor properties. The Schizosaccharomyces pombe GPCR Mam2 is a pheromone-sensing receptor involved in mating and has previously been shown to form oligomers in vivo. The first transmembrane domain (TMD) of Mam2 contains a small-XXX-small motif, overrepresented in membrane proteins and well-known for promoting helix-helix interactions. An ortholog of Mam2 in Saccharomyces cerevisiae, Ste2, contains an analogous small-XXX-small motif which has been shown to contribute to receptor homo-oligomerization, localization and function. Here we have used experimental and computational techniques to characterize the role of the small-XXX-small motif in function and assembly of Mam2 for the first time. We find that disruption of the motif via mutagenesis leads to reduction of Mam2 TMD1 homo-oligomerization and pheromone-responsive cellular signaling of the full-length protein. It also impairs correct targeting to the plasma membrane. Mutation of the analogous motif in Ste2 yielded similar results, suggesting a conserved mechanism for assembly. Using co-expression of the two fungal receptors in conjunction with computational models, we demonstrate a functional change in G protein specificity and propose that this is brought about through hetero-dimeric interactions of Mam2 with Ste2 via the complementary small-XXX-small motifs. This highlights the potential of these motifs to affect a range of properties that can be investigated in other GPCRs.

  12. MEME SUITE: tools for motif discovery and searching.

    Science.gov (United States)

    Bailey, Timothy L; Boden, Mikael; Buske, Fabian A; Frith, Martin; Grant, Charles E; Clementi, Luca; Ren, Jingyuan; Li, Wilfred W; Noble, William S

    2009-07-01

    The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms--MAST, FIMO and GLAM2SCAN--allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm TOMTOM. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and TOMTOM), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.

  13. Flow Cytometry-assisted Cloning of Specific Sequence Motifs fromComplex 16S ribosomal RNA Gene Libraries.

    Energy Technology Data Exchange (ETDEWEB)

    Nielsen, J.L.; Schramm, A.; Bernhard, A.E.; van den Engh, G.J.; Stahl, D.A.

    2004-07-21

    A flow cytometry method was developed for rapid screeningand recovery of cloned DNA containing common sequence motifs. Thisapproach, termed fluorescence-activated cell sorting-assisted cloning,was used to recover sequences affiliated with a unique lineage within theBacteroidetes not abundant in a clone library of environmental 16S rRNAgenes. Retrieval and sequence analysis of phylogenetically informativegenes has become a standard cultivation-independent technique toinvestigate microbial diversity in nature (7, 18). Genes encoding the 16SrRNA, because of the relative ease of their selective amplification, havebeen most frequently employed for general diversity surveys (16).Environmental studies have also focused on specific subpopulationsaffiliated with a phylogenetic group or identified by genes encodingspecific metabolic functions (e.g., ammonia oxidation, sulfaterespiration, and nitrate reduction) (8,15,20). However, specificpopulations may be of low abundance (1,23), or the genes encodingspecific metabolic functions may be insufficiently conserved to providepriming sites for general PCR amplification. Three general approacheshave been used to obtain 16S rRNA sequence information from low-abundancepopulations: screening hundreds to thousands of clones in a general 16SrRNA gene library (21), flow cytometric sorting of a subpopulation ofenvironmentally derived cells labeled by fluorescent in situhybridization (FISH) (27), or selective PCR amplification using primersspecific for the subpopulation (2,23). While the first approach is simplytime-consuming and tedious, the second has been restricted to fairlylarge and strongly fluorescent cells from aquatic samples (5, 27). Thethird approach often generates fragments of only a few hundred bases dueto the limited number of specific priming sites. Partial sequenceinformation often degrades analysis, obscuring or distorting thephylogenetic placement of the new sequences (11, 20). A more robustcharacterization of environ

  14. Sequence and Spatiotemporal Expression Analysis of CLE-Motif Containing Genes from the Reniform Nematode (Rotylenchulus reniformis Linford & Oliveira).

    Science.gov (United States)

    Wubben, Martin J; Gavilano, Lily; Baum, Thomas J; Davis, Eric L

    2015-06-01

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globodera, and Meloidogyne genera of sedentary endoparasites. Here, we describe the isolation, sequence analysis, and spatiotemporal expression of three R. reniformis genes encoding putative CLE motifs named Rr-cle-1, Rr-cle-2, and Rr-cle-3. The Rr-cle cDNAs showed >98% identity with each other and the predicted peptides were identical with the exception of a short stretch of residues at the carboxy(C)-terminus of the variable domain (VD). Each RrCLE peptide possessed an amino-terminal signal peptide for secretion and a single C-terminal CLE motif that was most similar to Heterodera CLE motifs. Aligning the Rr-cle cDNAs with their corresponding genomic sequences showed three exons with an intron separating the signal peptide from the VD and a second intron separating the VD from the CLE motif. An alignment of the RrCLE1 peptide with Heterodera glycines and Heterodera schachtii CLE proteins revealed a high level of homology within the VD region associated with regulating in planta trafficking of the processed CLE peptide. Quantitative RT-PCR (qRT-PCR) showed similar expression profiles for each Rr-cle transcript across the R. reniformis life-cycle with the greatest transcript abundance being in sedentary parasitic female nematodes. In situ hybridization showed specific Rr-cle expression within the dorsal esophageal gland cell of sedentary parasitic females.

  15. Functional roles of benzothiazole motif in antiepileptic drug research.

    Science.gov (United States)

    Amir, Mohammad; Hassan, Mohd Zaheen

    2013-12-01

    Benzothiazoles are promising candidates for the design of novel antiepileptic drugs. The endocyclic sulphur and nitrogen functions present in this heterocyclic nucleus have been shown to be critical for the anticonvulsant activity. The present review outlines the rational design and anticonvulsant potential of promising benzothiazole lead molecules. Particular focus has been placed on the structure activity relationship of different benzothiazole derivatives giving selected examples of molecules with significant activity being that these molecules may serve as prototypes for the development of more active antiepileptic drugs.

  16. Introducing tetraCys motifs at two different sites results in a functional dopamine transporter

    DEFF Research Database (Denmark)

    Orun, Oya; Rasmussen, S; Gether, U

    2009-01-01

    We have introduced tetracysteine motifs into different positions of the dopamine transporter (DAT) for specific FlAsH labeling. Two of the constructs expressed at the cell surface and were functional as determined by [3H] dopamine uptake experiments. The N-terminally modified transporter showed...

  17. Design of a biochemical circuit motif for learning linear functions.

    Science.gov (United States)

    Lakin, Matthew R; Minnich, Amanda; Lane, Terran; Stefanovic, Darko

    2014-12-06

    Learning and adaptive behaviour are fundamental biological processes. A key goal in the field of bioengineering is to develop biochemical circuit architectures with the ability to adapt to dynamic chemical environments. Here, we present a novel design for a biomolecular circuit capable of supervised learning of linear functions, using a model based on chemical reactions catalysed by DNAzymes. To achieve this, we propose a novel mechanism of maintaining and modifying internal state in biochemical systems, thereby advancing the state of the art in biomolecular circuit architecture. We use simulations to demonstrate that the circuit is capable of learning behaviour and assess its asymptotic learning performance, scalability and robustness to noise. Such circuits show great potential for building autonomous in vivo nanomedical devices. While such a biochemical system can tell us a great deal about the fundamentals of learning in living systems and may have broad applications in biomedicine (e.g. autonomous and adaptive drugs), it also offers some intriguing challenges and surprising behaviours from a machine learning perspective.

  18. Structural analysis of a repetitive protein sequence motif in strepsirrhine primate amelogenin.

    Directory of Open Access Journals (Sweden)

    Rodrigo S Lacruz

    Full Text Available Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL, the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates.

  19. Structural Analysis of a Repetitive Protein Sequence Motif in Strepsirrhine Primate Amelogenin

    Science.gov (United States)

    Bromley, Keith M.; Hacia, Joseph G.; Bromage, Timothy G.; Snead, Malcolm L.; Moradian-Oldak, Janet; Paine, Michael L.

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  20. MPN+, a putative catalytic motif found in a subset of MPN domain proteins from eukaryotes and prokaryotes, is critical for Rpn11 function

    Directory of Open Access Journals (Sweden)

    Hofmann Kay

    2002-09-01

    Full Text Available Abstract Background Three macromolecular assemblages, the lid complex of the proteasome, the COP9-Signalosome (CSN and the eIF3 complex, all consist of multiple proteins harboring MPN and PCI domains. Up to now, no specific function for any of these proteins has been defined, nor has the importance of these motifs been elucidated. In particular Rpn11, a lid subunit, serves as the paradigm for MPN-containing proteins as it is highly conserved and important for proteasome function. Results We have identified a sequence motif, termed the MPN+ motif, which is highly conserved in a subset of MPN domain proteins such as Rpn11 and Csn5/Jab1, but is not present outside of this subfamily. The MPN+ motif consists of five polar residues that resemble the active site residues of hydrolytic enzyme classes, particularly that of metalloproteases. By using site-directed mutagenesis, we show that the MPN+ residues are important for the function of Rpn11, while a highly conserved Cys residue outside of the MPN+ motif is not essential. Single amino acid substitutions in MPN+ residues all show similar phenotypes, including slow growth, sensitivity to temperature and amino acid analogs, and general proteasome-dependent proteolysis defects. Conclusions The MPN+ motif is abundant in certain MPN-domain proteins, including newly identified proteins of eukaryotes, bacteria and archaea thought to act outside of the traditional large PCI/MPN complexes. The putative catalytic nature of the MPN+ motif makes it a good candidate for a pivotal enzymatic function, possibly a proteasome-associated deubiquitinating activity and a CSN-associated Nedd8/Rub1-removing activity.

  1. MIDDAS-M: motif-independent de novo detection of secondary metabolite gene clusters through the integration of genome sequencing and transcriptome data.

    Science.gov (United States)

    Umemura, Myco; Koike, Hideaki; Nagano, Nozomi; Ishii, Tomoko; Kawano, Jin; Yamane, Noriko; Kozone, Ikuko; Horimoto, Katsuhisa; Shin-ya, Kazuo; Asai, Kiyoshi; Yu, Jiujiang; Bennett, Joan W; Machida, Masayuki

    2013-01-01

    Many bioactive natural products are produced as "secondary metabolites" by plants, bacteria, and fungi. During the middle of the 20th century, several secondary metabolites from fungi revolutionized the pharmaceutical industry, for example, penicillin, lovastatin, and cyclosporine. They are generally biosynthesized by enzymes encoded by clusters of coordinately regulated genes, and several motif-based methods have been developed to detect secondary metabolite biosynthetic (SMB) gene clusters using the sequence information of typical SMB core genes such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS). However, no detection method exists for SMB gene clusters that are functional and do not include core SMB genes at present. To advance the exploration of SMB gene clusters, especially those without known core genes, we developed MIDDAS-M, a motif-independent de novodetection algorithm for SMB gene clusters. We integrated virtual gene cluster generation in an annotated genome sequence with highly sensitive scoring of the cooperative transcriptional regulation of cluster member genes. MIDDAS-M accurately predicted 38 SMB gene clusters that have been experimentally confirmed and/or predicted by other motif-based methods in 3 fungal strains. MIDDAS-M further identified a new SMB gene cluster for ustiloxin B, which was experimentally validated. Sequence analysis of the cluster genes indicated a novel mechanism for peptide biosynthesis independent of NRPS. Because it is fully computational and independent of empirical knowledge about SMB core genes, MIDDAS-M allows a large-scale, comprehensive analysis of SMB gene clusters, including those with novel biosynthetic mechanisms that do not contain any functionally characterized genes.

  2. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2007-02-01

    Full Text Available Abstract Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

  3. Localization of Proteins to the 1,2-Propanediol Utilization Microcompartment by Non-native Signal Sequences Is Mediated by a Common Hydrophobic Motif*

    Science.gov (United States)

    Jakobson, Christopher M.; Kim, Edward Y.; Slininger, Marilyn F.; Chien, Alex; Tullman-Ercek, Danielle

    2015-01-01

    Various bacteria localize metabolic pathways to proteinaceous organelles known as bacterial microcompartments (MCPs), enabling the metabolism of carbon sources to enhance survival and pathogenicity in the gut. There is considerable interest in exploiting bacterial MCPs for metabolic engineering applications, but little is known about the interactions between MCP signal sequences and the protein shells of different MCP systems. We found that the N-terminal sequences from the ethanolamine utilization (Eut) and glycyl radical-generating protein MCPs are able to target reporter proteins to the 1,2-propanediol utilization (Pdu) MCP, and that this localization is mediated by a conserved hydrophobic residue motif. Recapitulation of this motif by the addition of a single amino acid conferred targeting function on an N-terminal sequence from the ethanol utilization MCP system that previously did not act as a Pdu signal sequence. Moreover, the Pdu-localized signal sequences competed with native Pdu targeting sequences for encapsulation in the Pdu MCP. Salmonella enterica natively possesses both the Pdu and Eut operons, and our results suggest that Eut proteins might be localized to the Pdu MCP in vivo. We further demonstrate that S. enterica LT2 retained the ability to grow on 1,2-propanediol as the sole carbon source when a Pdu enzyme was replaced with its Eut homolog. Although the relevance of this finding to the native system remains to be explored, we show that the Pdu-localized signal sequences described herein allow control over the ratio of heterologous proteins encapsulated within Pdu MCPs. PMID:26283792

  4. Localization of proteins to the 1,2-propanediol utilization microcompartment by non-native signal sequences is mediated by a common hydrophobic motif.

    Science.gov (United States)

    Jakobson, Christopher M; Kim, Edward Y; Slininger, Marilyn F; Chien, Alex; Tullman-Ercek, Danielle

    2015-10-02

    Various bacteria localize metabolic pathways to proteinaceous organelles known as bacterial microcompartments (MCPs), enabling the metabolism of carbon sources to enhance survival and pathogenicity in the gut. There is considerable interest in exploiting bacterial MCPs for metabolic engineering applications, but little is known about the interactions between MCP signal sequences and the protein shells of different MCP systems. We found that the N-terminal sequences from the ethanolamine utilization (Eut) and glycyl radical-generating protein MCPs are able to target reporter proteins to the 1,2-propanediol utilization (Pdu) MCP, and that this localization is mediated by a conserved hydrophobic residue motif. Recapitulation of this motif by the addition of a single amino acid conferred targeting function on an N-terminal sequence from the ethanol utilization MCP system that previously did not act as a Pdu signal sequence. Moreover, the Pdu-localized signal sequences competed with native Pdu targeting sequences for encapsulation in the Pdu MCP. Salmonella enterica natively possesses both the Pdu and Eut operons, and our results suggest that Eut proteins might be localized to the Pdu MCP in vivo. We further demonstrate that S. enterica LT2 retained the ability to grow on 1,2-propanediol as the sole carbon source when a Pdu enzyme was replaced with its Eut homolog. Although the relevance of this finding to the native system remains to be explored, we show that the Pdu-localized signal sequences described herein allow control over the ratio of heterologous proteins encapsulated within Pdu MCPs. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. Proteome-wide search for functional motifs altered in tumors: Prediction of nuclear export signals inactivated by cancer-related mutations.

    Science.gov (United States)

    Prieto, Gorka; Fullaondo, Asier; Rodríguez, Jose A

    2016-05-12

    Large-scale sequencing projects are uncovering a growing number of missense mutations in human tumors. Understanding the phenotypic consequences of these alterations represents a formidable challenge. In silico prediction of functionally relevant amino acid motifs disrupted by cancer mutations could provide insight into the potential impact of a mutation, and guide functional tests. We have previously described Wregex, a tool for the identification of potential functional motifs, such as nuclear export signals (NESs), in proteins. Here, we present an improved version that allows motif prediction to be combined with data from large repositories, such as the Catalogue of Somatic Mutations in Cancer (COSMIC), and to be applied to a whole proteome scale. As an example, we have searched the human proteome for candidate NES motifs that could be altered by cancer-related mutations included in the COSMIC database. A subset of the candidate NESs identified was experimentally tested using an in vivo nuclear export assay. A significant proportion of the selected motifs exhibited nuclear export activity, which was abrogated by the COSMIC mutations. In addition, our search identified a cancer mutation that inactivates the NES of the human deubiquitinase USP21, and leads to the aberrant accumulation of this protein in the nucleus.

  6. MSDmotif: exploring protein sites and motifs

    Directory of Open Access Journals (Sweden)

    Henrick Kim

    2008-07-01

    Full Text Available Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.

  7. Matching of structural motifs using hashing on residue labels and geometric filtering for protein function prediction.

    Science.gov (United States)

    Moll, Mark; Kavraki, Lydia E

    2008-01-01

    There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Our focus is on methods that determine binding site similarity. Although several such methods exist, it still remains a challenging problem to quickly find all functionally-related matches for structural motifs in large data sets with high specificity. In this context, a structural motif is a set of 3D points annotated with physicochemical information that characterize a molecular function. We propose a new method called LabelHash that creates hash tables of n-tuples of residues for a set of targets. Using these hash tables, we can quickly look up partial matches to a motif and expand those matches to complete matches. We show that by applying only very mild geometric constraints we can find statistically significant matches with extremely high specificity in very large data sets and for very general structural motifs. We demonstrate that our method requires a reasonable amount of storage when employing a simple geometric filter and further improves on the specificity of our previous work while maintaining very high sensitivity. Our algorithm is evaluated on 20 homolog classes and a non-redundant version of the Protein Data Bank as our background data set. We use cluster analysis to analyze why certain classes of homologs are more difficult to classify than others. The LabelHash algorithm is implemented on a web server at http://kavrakilab.org/labelhash/.

  8. Application of PCR amplicon sequencing using a single primer pair in PCR amplification to assess variations in Helicobacter pylori CagA EPIYA tyrosine phosphorylation motifs

    OpenAIRE

    Karlsson Anneli; Monstein Hans-Jürg; Ryberg Anna; Borch Kurt

    2010-01-01

    Background The presence of various EPIYA tyrosine phosphorylation motifs in the CagA protein of Helicobacter pylori has been suggested to contribute to pathogenesis in adults. In this study, a unique PCR assay and sequencing strategy was developed to establish the number and variation of cagA EPIYA motifs. Findings MDA-DNA derived from gastric biopsy specimens from eleven subjects with gastritis was used with M13- and T7- sequence-tagged primers for amplification of the cagA EPIYA motif regio...

  9. New structural and functional contexts of the Dx[DN]xDG linear motif: insights into evolution of calcium-binding proteins.

    Science.gov (United States)

    Rigden, Daniel J; Woodhead, Duncan D; Wong, Prudence W H; Galperin, Michael Y

    2011-01-01

    Binding of calcium ions (Ca²⁺) to proteins can have profound effects on their structure and function. Common roles of calcium binding include structure stabilization and regulation of activity. It is known that diverse families--EF-hands being one of at least twelve--use a Dx[DN]xDG linear motif to bind calcium in near-identical fashion. Here, four novel structural contexts for the motif are described. Existing experimental data for one of them, a thermophilic archaeal subtilisin, demonstrate for the first time a role for Dx[DN]xDG-bound calcium in protein folding. An integrin-like embedding of the motif in the blade of a β-propeller fold--here named the calcium blade--is discovered in structures of bacterial and fungal proteins. Furthermore, sensitive database searches suggest a common origin for the calcium blade in β-propeller structures of different sizes and a pan-kingdom distribution of these proteins. Factors favouring the multiple convergent evolution of the motif appear to include its general Asp-richness, the regular spacing of the Asp residues and the fact that change of Asp into Gly and vice versa can occur though a single nucleotide change. Among the known structural contexts for the Dx[DN]xDG motif, only the calcium blade and the EF-hand are currently found intracellularly in large numbers, perhaps because the higher extracellular concentration of Ca²⁺ allows for easier fixing of newly evolved motifs that have acquired useful functions. The analysis presented here will inform ongoing efforts toward prediction of similar calcium-binding motifs from sequence information alone.

  10. Conserved sequence motifs upstream from the co-ordinately expressed vitellogenin and apoVLDLII genes of chicken.

    Science.gov (United States)

    van het Schip, F; Strijker, R; Samallo, J; Gruber, M; Geert, A B

    1986-11-11

    The vitellogenin and apoVLDLII yolk protein genes of chicken are transcribed in the liver upon estrogenization. To get information on putative regulatory elements, we compared more than 2 kb of their 5' flanking DNA sequences. Common sequence motifs were found in regions exhibiting estrogen-induced changes in chromatin structure. Stretches of alternating pyrimidines and purines of about 30-nucleotides long are present at roughly similar positions. A distinct box of sequence homology in the chicken genes also appears to be present at a similar position in front of the vitellogenin genes of Xenopus laevis, but is absent from the estrogen-responsive egg-white protein genes expressed in the oviduct. In front of the vitellogenin (position -595) and the VLDLII gene (position -548), a DNA element of about 300 base-pairs was found, which possesses structural characteristics of a mobile genetic element and bears homology to the transposon-like Vi element of Xenopus laevis.

  11. Sevoflurane Alters Spatiotemporal Functional Connectivity Motifs That Link Resting-State Networks during Wakefulness

    Science.gov (United States)

    Kafashan, MohammadMehdi; Ching, ShiNung; Palanca, Ben J. A.

    2016-01-01

    Background: The spatiotemporal patterns of correlated neural activity during the transition from wakefulness to general anesthesia have not been fully characterized. Correlation analysis of blood-oxygen-level dependent (BOLD) functional magnetic resonance imaging (fMRI) allows segmentation of the brain into resting-state networks (RSNs), with functional connectivity referring to the covarying activity that suggests shared functional specialization. We quantified the persistence of these correlations following the induction of general anesthesia in healthy volunteers and assessed for a dynamic nature over time. Methods: We analyzed human fMRI data acquired at 0 and 1.2% vol sevoflurane. The covariance in the correlated activity among different brain regions was calculated over time using bounded Kalman filtering. These time series were then clustered into eight orthogonal motifs using a K-means algorithm, where the structure of correlated activity throughout the brain at any time is the weighted sum of all motifs. Results: Across time scales and under anesthesia, the reorganization of interactions between RSNs is related to the strength of dynamic connections between member pairs. The covariance of correlated activity between RSNs persists compared to that linking individual member pairs of different RSNs. Conclusions: Accounting for the spatiotemporal structure of correlated BOLD signals, anesthetic-induced loss of consciousness is mainly associated with the disruption of motifs with intermediate strength within and between members of different RSNs. In contrast, motifs with higher strength of connections, predominantly with regions-pairs from within-RSN interactions, are conserved among states of wakefulness and sevoflurane general anesthesia. PMID:28082871

  12. Interpreting the functional role of a novel interaction motif in prokaryotic sodium channels.

    Science.gov (United States)

    Sula, Altin; Wallace, B A

    2017-06-05

    Voltage-gated sodium channels enable the translocation of sodium ions across cell membranes and play crucial roles in electrical signaling by initiating the action potential. In humans, mutations in sodium channels give rise to several neurological and cardiovascular diseases, and hence they are targets for pharmaceutical drug developments. Prokaryotic sodium channel crystal structures have provided detailed views of sodium channels, which by homology have suggested potentially important functionally related structural features in human sodium channels. A new crystal structure of a full-length prokaryotic channel, NavMs, in a conformation we proposed to represent the open, activated state, has revealed a novel interaction motif associated with channel opening. This motif is associated with disease when mutated in human sodium channels and plays an important and dynamic role in our new model for channel activation. © 2017 Sula and Wallace.

  13. Requirement for asparagine in the aquaporin NPA sequence signature motifs for cation exclusion

    DEFF Research Database (Denmark)

    Wree, Dorothea; Wu, Binghua; Zeuthen, Thomas

    2011-01-01

    Two highly conserved NPA motifs are a hallmark of the aquaporin (AQP) family. The NPA triplets form N-terminal helix capping structures with the Asn side chains located in the centre of the water or solute-conducting channel, and are considered to play an important role in AQP selectivity. Although...... another AQP selectivity filter site, the aromatic/Arg (ar/R) constriction, has been well characterized by mutational analysis, experimental data concerning the NPA region--in particular, the Asn position--is missing. Here, we report on the cloning and mutational analysis of a novel aquaglyceroporin...

  14. Algebraic divisibility sequences over function fields

    CERN Document Server

    Ingram, Patrick; Silverman, Joseph H; Stange, Katherine E; Streng, Marco

    2011-01-01

    We study the existence of primes and of primitive divisors in classical divisibility sequences defined over function fields. Under various hypotheses, we prove that Lucas sequences and elliptic divisibility sequences over function fields defined over number fields contain infinitely many irreducible elements. We also prove that an elliptic divisibility sequence over a function field has only finitely many terms lacking a primitive divisor.

  15. Examination of the transcription factor NtcA-binding motif by in vitro selection of DNA sequences from a random library.

    Science.gov (United States)

    Jiang, F; Wisén, S; Widersten, M; Bergman, B; Mannervik, B

    2000-08-25

    A recursive in vitro selection among random DNA sequences was used for analysis of the cyanobacterial transcription factor NtcA-binding motifs. An eight-base palindromic sequence, TGTA-(N(8))-TACA, was found to be the optimal NtcA-binding sequence. The more divergent the binding sequences, compared to this consensus sequence, the lower the NtcA affinity. The second and third bases in each four-nucleotide half of the consensus sequence were crucial for NtcA binding, and they were in general highly conserved. The most frequently occurring sequence in the middle weakly conserved region was similar to that of the NtcA-binding motif of the Anabaena sp. strain PCC 7120 glnA gene, previously known to have high affinity for NtcA. This indicates that the middle sequences were selected for high NtcA affinity. Analysis of natural NtcA-binding motifs showed that these could be classified into two groups based on differences in recognition consensus sequences. It is suggested that NtcA naturally recognizes different DNA-binding motifs, or has differential affinities to these sequences under different physiological conditions.

  16. A novel role for the fibrinogen Asn-Gly-Arg (NGR) motif in platelet function.

    Science.gov (United States)

    Moriarty, Róisín; McManus, Ciara A; Lambert, Matthew; Tilley, Thea; Devocelle, Marc; Brennan, Marian; Kerrigan, Steven W; Cox, Dermot

    2015-02-01

    The integrin αIIbβ3 on resting platelets can bind to immobilised fibrinogen resulting in platelet spreading and activation but requires activation to bind to soluble fibrinogen. αIIbβ3 is known to interact with the general integrin-recognition motif RGD (arginine-glycine-aspartate) as well as the fibrinogen-specific γ-chain dodecapeptide; however, it is not known how fibrinogen binding triggers platelet activation. NGR (asparagine-glycine-arginine) is another integrin-recognition sequence present in fibrinogen and this study aims to determine if it plays a role in the interaction between fibrinogen and αIIbβ3. NGR-containing peptides inhibited resting platelet adhesion to fibrinogen with an IC50 of 175 µM but failed to inhibit the adhesion of activated platelets to fibrinogen (IC50> 500 µM). Resting platelet adhesion to mutant fibrinogens lacking the NGR sequences was reduced compared to normal fibrinogen under both static and shear conditions (200 s⁻¹). However, pre-activated platelets were able to fully spread on all types of fibrinogen. Thus, the NGR motif in fibrinogen is the site that is primarily responsible for the interaction with resting αIIbβ3 and is responsible for triggering platelet activation.

  17. Isolation of a Δ5 desaturase gene from Euglena gracilis and functional dissection of its HPGG and HDASH motifs.

    Science.gov (United States)

    Walters Pollak, Dana; Bostick, Michael W; Yoon, Hyeryoung; Wang, Jamie; Hollerbach, Dieter H; He, Hongxian; Damude, Howard G; Zhang, Hongxiang; Yadav, Narendra S; Hong, Seung-Pyo; Sharpe, Pamela; Xue, Zhixiong; Zhu, Quinn

    2012-09-01

    Delta (Δ) 5 desaturase is a key enzyme for the biosynthesis of health-beneficial long chain polyunsaturated fatty acids such as arachidonic acid (ARA, C20:4n-6), eicosapentaenoic acid (C20:5n-3) and docosahexaenoic acid (C22:6n-3) via the "desaturation and elongation" pathways. A full length Δ5 desaturase gene from Euglena gracilis (EgΔ5D) was isolated by cloning the products of polymerase chain reaction with degenerate oligonucleotides as primers, followed by 5' and 3' rapid amplification of cDNA ends. The whole coding region of EgΔ5D was 1,350 nucleotides in length and encoded a polypeptide of 449 amino acids. BlastP search showed that EgΔ5D has about 39 % identity with a Δ5 desaturase of Phaeodactylum tricornutum. In a genetically modified dihomo-gamma-linoleic acid (DGLA, C20:3n-6) producing Yarrowia lipolytica strain, EgΔ5D had strong Δ5 desaturase activity with DGLA to ARA conversion of more than 24 %. Functional dissection of its HPGG and HDASH motifs demonstrated that both motifs were important, but not necessary in the exact form as encoded for the enzyme activity of EgΔ5D. A double mutant EgΔ5D-34G158G with altered sequences within both HPGG and HDASH motifs was generated and exhibited Δ5 desaturase activity similar to the wild type EgΔ5D. Codon optimization of the N-terminal region of EgΔ5D-34G158G and substitution of the arginine with serine at residue 347 improved substrate conversion to 27.6 %.

  18. Cancer bioinformatics: detection of chromatin states,SNP-containing motifs, and functional enrichment modules

    Institute of Scientific and Technical Information of China (English)

    Xiaobo Zhou

    2013-01-01

    In this editorial preface,I briefly review cancer bioinformatics and introduce the four articles in this special issue highlighting important applications of the field:detection of chromatin states; detection of SNP-containing motifs and association with transcription factor-binding sites; improvements in functional enrichment modules; and gene association studies on aging and cancer.We expect this issue to provide bioinformatics scientists,cancer biologists,and clinical doctors with a better understanding of how cancer bioinformatics can be used to identify candidate biomarkers and targets and to conduct functional analysis.

  19. A Novel Function for the Conserved Glutamate Residue in the Walker B Motif of Replication Factor C

    Directory of Open Access Journals (Sweden)

    Linda B. Bloom

    2013-03-01

    Full Text Available In all domains of life, sliding clamps tether DNA polymerases to DNA to increase the processivity of synthesis. Clamp loaders load clamps onto DNA in a multi-step process that requires ATP binding and hydrolysis. Like other AAA+ proteins, clamp loaders contain conserved Walker A and Walker B sequence motifs, which participate in ATP binding and hydrolysis, respectively. Mutation of the glutamate residue in Walker B motifs (or DExx-boxes in AAA+ proteins typically reduces ATP hydrolysis by as much as a couple orders of magnitude, but has no effect on ATP binding. Here, the Walker B Glu in each of the four active ATP sites of the eukaryotic clamp loader, RFC, was mutated to Gln and Ala separately, and ATP binding- and hydrolysis-dependent activities of the quadruple mutant clamp loaders were characterized. Fluorescence-based assays were used to measure individual reaction steps required for clamp loading including clamp binding, clamp opening, DNA binding and ATP hydrolysis. Our results show that the Walker B mutations affect ATP-binding-dependent interactions of RFC with the clamp and DNA in addition to reducing ligand-dependent ATP hydrolysis activity. Here, we show that the Walker B glutamate is required for ATP-dependent ligand binding activity, a previously unknown function for this conserved Glu residue in RFC.

  20. The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site.

    Science.gov (United States)

    Tharakaraman, Kannan; Bodenreider, Olivier; Landsman, David; Spouge, John L; Mariño-Ramírez, Leonardo

    2008-05-01

    A number of previous studies have predicted transcription factor binding sites (TFBSs) by exploiting the position of genomic landmarks like the transcriptional start site (TSS). The studies' methods are generally too computationally intensive for genome-scale investigation, so the full potential of 'positional regulomics' to discover TFBSs and determine their function remains unknown. Because databases often annotate the genomic landmarks in DNA sequences, the methodical exploitation of positional regulomics has become increasingly urgent. Accordingly, we examined a set of 7914 human putative promoter regions (PPRs) with a known TSS. Our methods identified 1226 eight-letter DNA words with significant positional preferences with respect to the TSS, of which only 608 of the 1226 words matched known TFBSs. Many groups of genes whose PPRs contained a common word displayed similar expression profiles and related biological functions, however. Most interestingly, our results included 78 words, each of which clustered significantly in two or three different positions relative to the TSS. Often, the gene groups corresponding to different positional clusters of the same word corresponded to diverse functions, e.g. activation or repression in different tissues. Thus, different clusters of the same word likely reflect the phenomenon of 'positional regulation', i.e. a word's regulatory function can vary with its position relative to a genomic landmark, a conclusion inaccessible to methods based purely on sequence. Further integrative analysis of words co-occurring in PPRs also yielded 24 different groups of genes, likely identifying cis-regulatory modules de novo. Whereas comparative genomics requires precise sequence alignments, positional regulomics exploits genomic landmarks to provide a 'poor man's alignment'. By exploiting the phenomenon of positional regulation, it uses position to differentiate the biological functions of subsets of TFBSs sharing a common sequence motif.

  1. Linear array of conserved sequence motifs to discriminate protein subfamilies: study on pyridine nucleotide-disulfide reductases

    Directory of Open Access Journals (Sweden)

    De Las Rivas Javier

    2007-03-01

    Full Text Available Abstract Background The pyridine nucleotide disulfide reductase (PNDR is a large and heterogeneous protein family divided into two classes (I and II, which reflect the divergent evolution of its characteristic disulfide redox active site. However, not all the PNDR members fit into these categories and this suggests the need of further studies to achieve a more comprehensive classification of this complex family. Results A workflow to improve the clusterization of protein families based on the array of linear conserved motifs is designed. The method is applied to the PNDR large family finding two main groups, which correspond to PNDR classes I and II. However, two other separate protein clusters, previously classified as class I in most databases, are outgrouped: the peroxide reductases (NAOX, NAPE and the type II NADH dehydrogenases (NDH-2. In this way, two novel PNDR classes III and IV for NAOX/NAPE and NDH-2 respectively are proposed. By knowledge-driven biochemical and functional data analyses done on the new class IV, a linear array of motifs putatively related to Cu(II-reductase activity is detected in a specific subset of NDH-2. Conclusion The results presented are a novel contribution to the classification of the complex and large PNDR protein family, supporting its reclusterization into four classes. The linear array of motifs detected within the class IV PNDR subfamily could be useful as a signature for a particular subgroup of NDH-2.

  2. NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole

    2011-01-01

    to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs...

  3. Type 2 diabetes mellitus: phylogenetic motifs for predicting protein functional sites

    Indian Academy of Sciences (India)

    Ashok Sharma; Tanuja Rastogi; Meenakshi Bhartiya; A K Shasany; S P S Khanuja

    2007-08-01

    Diabetes mellitus, commonly referred to as diabetes, is a medical condition associated with abnormally high levels of glucose (or sugar) in the blood. Keeping this view, we demonstrate the phylogenetic motifs (PMs) identification in type 2 diabetes mellitus very likely corresponding to protein functional sites. In this article, we have identified PMs for all the candidate genes for type 2 diabetes mellitus. Glycine 310 remains conserved for glucokinase and potassium channel KCNJ11. Isoleucine 137 was conserved for insulin receptor and regulatory subunit of a phosphorylating enzyme. Whereas residues valine, leucine, methionine were highly conserved for insulin receptor. Occurrence of proline was very high for calpain 10 gene and glucose transporter

  4. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data.

    Science.gov (United States)

    Gelfond, Jonathan A L; Gupta, Mayetri; Ibrahim, Joseph G

    2009-12-01

    We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.

  5. Peptomics, identification of novel cationic Arabidopsis peptides with conserved sequence motifs

    DEFF Research Database (Denmark)

    Olsen, Addie Nina; Mundy, John; Skriver, Karen

    2002-01-01

    Few plant peptides involved in intercellular communication have been experimentally isolated. Sequence analysis of the Arabidopsis thaliana genome has revealed numerous transmembrane receptors predicted to bind proteinacious ligands, emphasizing the importance of identifying peptides with signali...

  6. Functional significance for a heterogenous ribonucleoprotein A18 signature RNA motif in the 3'-untranslated region of ataxia telangiectasia mutated and Rad3-related (ATR) transcript.

    Science.gov (United States)

    Yang, Ruiqing; Zhan, Ming; Nalabothula, Narasimha Rao; Yang, Qingyuan; Indig, Fred E; Carrier, France

    2010-03-19

    The predominantly nuclear heterogenous ribonucleoprotein A18 (hnRNP A18) translocates to the cytosol in response to cellular stress and increases translation by specifically binding to the 3'-untranslated region (UTR) of several mRNA transcripts and the eukaryotic initiation factor 4G. Here, we identified a 51-nucleotide motif that is present 11.49 times more often in the 3'-UTR of hnRNP A18 mRNA targets than in the UniGene data base. This motif was identified by computational analysis of primary sequences and secondary structures of hnRNP A18 mRNA targets against the unaligned sequences. Band shift analyses indicate that the motif is sufficient to confer binding to hnRNP A18. A search of the entire UniGene data base indicates that the hnRNP A18 motif is also present in the 3'-UTR of the ataxia telangiectasia mutated and Rad3-related (ATR) mRNA. Validation of the predicted hnRNP A18 motif is provided by amplification of endogenous ATR transcript on polysomal fractions immunoprecipitated with hnRNP A18. Moreover, overexpression of hnRNP A18 results in increased ATR protein levels and increased phosphorylation of Chk1, a preferred ATR substrate, in response to UV radiation. In addition, our data indicate that inhibition of casein kinase II or GSK3beta significantly reduced hnRNP A18 cytosolic translocation in response to UV radiation. To our knowledge, this constitutes the first demonstration of a post-transcriptional regulatory mechanism for ATR activity. hnRNP A18 could thus become a new target to trigger ATR activity as back-up stress response mechanisms to functionally compensate for absent or defective responders.

  7. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

    Science.gov (United States)

    Gilbert, N; Labuda, D

    1999-03-16

    A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.

  8. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

    Science.gov (United States)

    Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

    2012-01-01

    To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.

  9. Thio-sugar motif of functional CARB-pharmacophore for antineoplastic activity. Part 2.

    Science.gov (United States)

    Witczak, Zbigniew J; Sarnik, Joanna; Czubatka, Anna; Forma, Ewa; Poplawski, Tomasz

    2014-12-15

    Diverse functionalized representatives of (1-4)-S-thiodisaccharides, 6-9 were synthesized and assessed for cytotoxicity and apoptosis against human cancer cell lines (A549, LoVo, MCF-7 and HeLa). The FCP 6 was more active against MCF-7 cells (i.e., an estrogen-dependent breast cancer line), whereas other (1-4)-S-thiodisaccharides showed strongest activity against A549 cells (i.e., a lung adenocarcinoma line). We propose to use a concept of functional 'CARB-pharmacophores' when evaluating a potential for the compounds' general antineoplastic activity. Future studies will determine the reasons for cell-type specificity of these compounds. The thio-sugar motif appears to be a promising lead for future developments. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. TdIF1 recognizes a specific DNA sequence through its Helix-Turn-Helix and AT-hook motifs to regulate gene transcription.

    Directory of Open Access Journals (Sweden)

    Takashi Kubota

    Full Text Available TdIF1 was originally identified as a protein that directly binds to DNA polymerase TdT. TdIF1 is also thought to function in transcription regulation, because it binds directly to the transcriptional factor TReP-132, and to histone deacetylases HDAC1 and HDAC2. Here we show that TdIF1 recognizes a specific DNA sequence and regulates gene transcription. By constructing TdIF1 mutants, we identify amino acid residues essential for its interaction with DNA. An in vitro DNA selection assay, SELEX, reveals that TdIF1 preferentially binds to the sequence 5'-GNTGCATG-3' following an AT-tract, through its Helix-Turn-Helix and AT-hook motifs. We show that four repeats of this recognition sequence allow TdIF1 to regulate gene transcription in a plasmid-based luciferase reporter assay. We demonstrate that TdIF1 associates with the RAB20 promoter, and RAB20 gene transcription is reduced in TdIF1-knocked-down cells, suggesting that TdIF1 stimulates RAB20 gene transcription.

  11. i-motif structures in long cytosine-rich sequences found upstream of the promoter region of the SMARCA4 gene.

    Science.gov (United States)

    Benabou, Sanae; Aviñó, Anna; Lyonnais, S; González, C; Eritja, Ramon; De Juan, Anna; Gargallo, Raimundo

    2017-09-01

    Cytosine-rich oligonucleotides are capable of forming complex structures known as i-motif with increasingly studied biological properties. The study of sequences prone to form i-motifs located near the promoter region of genes may be difficult because these sequences not only contain repeats of cytosine tracts of disparate length but also these may be separated by loops of varied nature and length. In this work, the formation of intramolecular i-motif structures by a long sequence located upstream of the promoter region of the SMARCA4 gene has been demonstrated. Nuclear Magnetic Resonance, Circular Dichroism, Gel Electrophoresis, Size-Exclusion Chromatography, and multivariate analysis have been used. Not only the wild sequence (5'-TC3T2GCTATC3TGTC2TGC2TCGC3T2G2TCATGA2C4-3') has been studied but also several other truncated and mutated sequences. Despite the apparent complex sequence, the results showed that the wild sequence may form a relatively stable and homogeneous unimolecular i-motif structure, both in terms of pH or temperature. The model ligand TMPyP4 destabilizes the structure, whereas the presence of 20% (w/v) PEG200 stabilized it slightly. This finding opens the door to the study of the interaction of these kind of i-motif structures with stabilizing ligands or proteins. Copyright © 2017 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.

  12. Identification of E-cadherin signature motifs functioning as cleavage sites for Helicobacter pylori HtrA

    Science.gov (United States)

    Schmidt, Thomas P.; Perna, Anna M.; Fugmann, Tim; Böhm, Manja; Jan Hiss; Haller, Sarah; Götz, Camilla; Tegtmeyer, Nicole; Hoy, Benjamin; Rau, Tilman T.; Neri, Dario; Backert, Steffen; Schneider, Gisbert; Wessler, Silja

    2016-03-01

    The cell adhesion protein and tumour suppressor E-cadherin exhibits important functions in the prevention of gastric cancer. As a class-I carcinogen, Helicobacter pylori (H. pylori) has developed a unique strategy to interfere with E-cadherin functions. In previous studies, we have demonstrated that H. pylori secretes the protease high temperature requirement A (HtrA) which cleaves off the E-cadherin ectodomain (NTF) on epithelial cells. This opens cell-to-cell junctions, allowing bacterial transmigration across the polarised epithelium. Here, we investigated the molecular mechanism of the HtrA-E-cadherin interaction and identified E-cadherin cleavage sites for HtrA. Mass-spectrometry-based proteomics and Edman degradation revealed three signature motifs containing the [VITA]-[VITA]-x-x-D-[DN] sequence pattern, which were preferentially cleaved by HtrA. Based on these sites, we developed a substrate-derived peptide inhibitor that selectively bound and inhibited HtrA, thereby blocking transmigration of H. pylori. The discovery of HtrA-targeted signature sites might further explain why we detected a stable 90 kDa NTF fragment during H. pylori infection, but also additional E-cadherin fragments ranging from 105 kDa to 48 kDa in in vitro cleavage experiments. In conclusion, HtrA targets E-cadherin signature sites that are accessible in in vitro reactions, but might be partially masked on epithelial cells through functional homophilic E-cadherin interactions.

  13. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

    Science.gov (United States)

    Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

    2009-11-01

    Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.

  14. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    Science.gov (United States)

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  15. Multilocus sequence evaluation for differentiating species of the trematode Family Gastrothylacidae, with a note on the utility of mitochondrial COI motifs in species identification.

    Science.gov (United States)

    Ghatani, Sudeep; Shylla, Jollin Andrea; Roy, Bishnupada; Tandon, Veena

    2014-09-15

    Amphistomiasis, a neglected trematode infectious disease of ruminants, is caused by numerous species of amphistomes belonging to six families under the Superfamily Paramphistomoidea. In the present study, four frequently used DNA markers, viz. nuclear ribosomal 28S (D1-D3 regions), 18S and ITS2 and mitochondrial COI genes, as well as sequence motifs from these genes were evaluated for their utility in species characterization of members of the amphistomes' Family Gastrothylacidae commonly prevailing in Northeast India. In sequence and phylogenetic analyses the COI gene turned out to be the most useful marker in identifying the gastrothylacid species, with the exception of Gastrothylax crumenifer, which showed a high degree of intraspecific variations among its isolates. The sequence analysis data also showed the ITS2 region to be effective for interspecies characterization, though the 28S and 18S genes were found unsuitable for the purpose. On the other hand, sequence motif analysis data revealed the motifs from the COI gene to be highly conserved and specific for their target species which allowed accurate in silico identification of the gastrothylacid species irrespective of their intraspecific differences. We propose the use of COI motifs generated in the study as a potential tool for identification of these species.

  16. Mutations in the catalytic loop HRD motif alter the activity and function of Drosophila Src64.

    Directory of Open Access Journals (Sweden)

    Taylor C Strong

    Full Text Available The catalytic loop HRD motif is found in most protein kinases and these amino acids are predicted to perform functions in catalysis, transition to, and stabilization of the active conformation of the kinase domain. We have identified mutations in a Drosophila src gene, src64, that alter the three HRD amino acids. We have analyzed the mutants for both biochemical activity and biological function during development. Mutation of the aspartate to asparagine eliminates biological function in cytoskeletal processes and severely reduces fertility, supporting the amino acid's critical role in enzymatic activity. The arginine to cysteine mutation has little to no effect on kinase activity or cytoskeletal reorganization, suggesting that the HRD arginine may not be critical for coordinating phosphotyrosine in the active conformation. The histidine to leucine mutant retains some kinase activity and biological function, suggesting that this amino acid may have a biochemical function in the active kinase that is independent of its side chain hydrogen bonding interactions in the active site. We also describe the phenotypic effects of other mutations in the SH2 and tyrosine kinase domains of src64, and we compare them to the phenotypic effects of the src64 null allele.

  17. Modification of cyclic NGR tumor neovasculature-homing motif sequence to human plasminogen kringle 5 improves inhibition of tumor growth.

    Directory of Open Access Journals (Sweden)

    Weiwei Jiang

    Full Text Available BACKGROUND: Blood vessels in tumors express higher level of aminopeptidase N (APN than normal tissues. Evidence suggests that the CNGRC motif is an APN ligand which targets tumor vasculature. Increased expression of APN in tumor vascular endothelium, therefore, offers an opportunity for targeted delivery of NGR peptide-linked drugs to tumors. METHODS/PRINCIPAL FINDINGS: To determine whether an additional cyclic CNGRC sequence could improve endothelial cell homing and antitumor effect, human plasminogen kringle 5 (hPK5 was modified genetically to introduce a CNGRC motif (NGR-hPK5 and was subsequently expressed in yeast. The biological activity of NGR-hPK5 was assessed and compared with that of wild-type hPK5, in vitro and in vivo. NGR-hPK5 showed more potent antiangiogenic activity than wild-type hPK5: the former had a stronger inhibitory effect on proliferation, migration and cord formation of vascular endothelial cells, and produced a stronger antiangiogenic response in the CAM assay. To evaluate the tumor-targeting ability, both wild-type hPK5 and NGR-hPK5 were (99 mTc-labeled, for tracking biodistribution in the in vivo tumor model. By planar imaging and biodistribution analyses of major organs, NGR-hPK5 was found localized to tumor tissues at a higher level than wild-type hPK5 (approximately 3-fold. Finally, the effects of wild-type hPK5 and NGR-modified hPK5 on tumor growth were investigated in two tumor model systems. NGR modification improved tumor localization and, as a consequence, effectively inhibited the growth of mouse Lewis lung carcinoma (LLC and human colorectal adenocarcinoma (Colo 205 cells in tumor-bearing mice. CONCLUSIONS/SIGNIFICANCE: These studies indicated that the addition of an APN targeting peptide NGR sequence could improve the ability of hPK5 to inhibit angiogenesis and tumor growth.

  18. Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrep-resented upstream motifs

    Directory of Open Access Journals (Sweden)

    Silengo Lorenzo

    2004-05-01

    Full Text Available Abstract Background Transcriptional regulation is a key mechanism in the functioning of the cell, and is mostly effected through transcription factors binding to specific recognition motifs located upstream of the coding region of the regulated gene. The computational identification of such motifs is made easier by the fact that they often appear several times in the upstream region of the regulated genes, so that the number of occurrences of relevant motifs is often significantly larger than expected by pure chance. Results To exploit this fact, we construct sets of genes characterized by the statistical overrepresentation of a certain motif in their upstream regions. Then we study the functional characterization of these sets by analyzing their annotation to Gene Ontology terms. For the sets showing a statistically significant specific functional characterization, we conjecture that the upstream motif characterizing the set is a binding site for a transcription factor involved in the regulation of the genes in the set. Conclusions The method we propose is able to identify many known binding sites in S. cerevisiae and new candidate targets of regulation by known transcritpion factors. Its application to less well studied organisms is likely to be valuable in the exploration of their regulatory interaction network.

  19. New bioactive motifs and their use in functionalized self-assembling peptides for NSC differentiation and neural tissue engineering

    Science.gov (United States)

    Gelain, F.; Cigognini, D.; Caprini, A.; Silva, D.; Colleoni, B.; Donegá, M.; Antonini, S.; Cohen, B. E.; Vescovi, A.

    2012-04-01

    Developing functionalized biomaterials for enhancing transplanted cell engraftment in vivo and stimulating the regeneration of injured tissues requires a multi-disciplinary approach customized for the tissue to be regenerated. In particular, nervous tissue engineering may take a great advantage from the discovery of novel functional motifs fostering transplanted stem cell engraftment and nervous fiber regeneration. Using phage display technology we have discovered new peptide sequences that bind to murine neural stem cell (NSC)-derived neural precursor cells (NPCs), and promote their viability and differentiation in vitro when linked to LDLK12 self-assembling peptide (SAPeptide). We characterized the newly functionalized LDLK12 SAPeptides via atomic force microscopy, circular dichroism and rheology, obtaining nanostructured hydrogels that support human and murine NSC proliferation and differentiation in vitro. One functionalized SAPeptide (Ac-FAQ), showing the highest stem cell viability and neural differentiation in vitro, was finally tested in acute contusive spinal cord injury in rats, where it fostered nervous tissue regrowth and improved locomotor recovery. Interestingly, animals treated with the non-functionalized LDLK12 had an axon sprouting/regeneration intermediate between Ac-FAQ-treated animals and controls. These results suggest that hydrogels functionalized with phage-derived peptides may constitute promising biomimetic scaffolds for in vitro NSC differentiation, as well as regenerative therapy of the injured nervous system. Moreover, this multi-disciplinary approach can be used to customize SAPeptides for other specific tissue engineering applications.Developing functionalized biomaterials for enhancing transplanted cell engraftment in vivo and stimulating the regeneration of injured tissues requires a multi-disciplinary approach customized for the tissue to be regenerated. In particular, nervous tissue engineering may take a great advantage from the

  20. Pyrimidine motif triple helix in the Kluyveromyces lactis telomerase RNA pseudoknot is essential for function in vivo.

    Science.gov (United States)

    Cash, Darian D; Cohen-Zontag, Osnat; Kim, Nak-Kyoon; Shefer, Kinneret; Brown, Yogev; Ulyanov, Nikolai B; Tzfati, Yehuda; Feigon, Juli

    2013-07-02

    Telomerase is a ribonucleoprotein complex that extends the 3' ends of linear chromosomes. The specialized telomerase reverse transcriptase requires a multidomain RNA (telomerase RNA, TER), which includes an integral RNA template and functionally important template-adjacent pseudoknot. The structure of the human TER pseudoknot revealed that the loops interact with the stems to form a triple helix shown to be important for activity in vitro. A similar triple helix has been predicted to form in diverse fungi TER pseudoknots. The solution NMR structure of the Kluyveromyces lactis pseudoknot, presented here, reveals that it contains a long pyrimidine motif triple helix with unexpected features that include three individual bulge nucleotides and a C(+)•G-C triple adjacent to a stem 2-loop 2 junction. Despite significant differences in sequence and base triples, the 3D shape of the human and K. lactis TER pseudoknots are remarkably similar. Analysis of the effects of nucleotide substitutions on cell growth and telomere lengths provides evidence that this conserved structure forms in endogenously assembled telomerase and is essential for telomerase function in vivo.

  1. Mouse transgenesis identifies conserved functional enhancers and cis-regulatory motif in the vertebrate LIM homeobox gene Lhx2 locus.

    Directory of Open Access Journals (Sweden)

    Alison P Lee

    Full Text Available The vertebrate Lhx2 is a member of the LIM homeobox family of transcription factors. It is essential for the normal development of the forebrain, eye, olfactory system and liver as well for the differentiation of lymphoid cells. However, despite the highly restricted spatio-temporal expression pattern of Lhx2, nothing is known about its transcriptional regulation. In mammals and chicken, Crb2, Dennd1a and Lhx2 constitute a conserved linkage block, while the intervening Dennd1a is lost in the fugu Lhx2 locus. To identify functional enhancers of Lhx2, we predicted conserved noncoding elements (CNEs in the human, mouse and fugu Crb2-Lhx2 loci and assayed their function in transgenic mouse at E11.5. Four of the eight CNE constructs tested functioned as tissue-specific enhancers in specific regions of the central nervous system and the dorsal root ganglia (DRG, recapitulating partial and overlapping expression patterns of Lhx2 and Crb2 genes. There was considerable overlap in the expression domains of the CNEs, which suggests that the CNEs are either redundant enhancers or regulating different genes in the locus. Using a large set of CNEs (810 CNEs associated with transcription factor-encoding genes that express predominantly in the central nervous system, we predicted four over-represented 8-mer motifs that are likely to be associated with expression in the central nervous system. Mutation of one of them in a CNE that drove reporter expression in the neural tube and DRG abolished expression in both domains indicating that this motif is essential for expression in these domains. The failure of the four functional enhancers to recapitulate the complete expression pattern of Lhx2 at E11.5 indicates that there must be other Lhx2 enhancers that are either located outside the region investigated or divergent in mammals and fishes. Other approaches such as sequence comparison between multiple mammals are required to identify and characterize such enhancers.

  2. Localization and trafficking of an isoform of the AtPRA1 family to the Golgi apparatus depend on both N- and C-terminal sequence motifs.

    Science.gov (United States)

    Jung, Chan Jin; Lee, Myoung Hui; Min, Myung Ki; Hwang, Inhwan

    2011-02-01

    Prenylated Rab acceptors (PRAs) bind to prenylated Rab proteins and possibly aid in targeting Rabs to their respective compartments. In Arabidopsis, 19 isoforms of PRA1 have been identified and, depending upon the isoforms, they localize to the endoplasmic reticulum (ER), Golgi apparatus and endosomes. Here, we investigated the localization and trafficking of AtPRA1.B6, an isoform of the Arabidopsis PRA1 family. In colocalization experiments with various organellar markers, AtPRA1.B6 tagged with hemagglutinin (HA) at the N-terminus localized to the Golgi apparatus in protoplasts and transgenic plants. The valine residue at the C-terminal end and an EEE motif in the C-terminal cytoplasmic domain were critical for anterograde trafficking from the ER to the Golgi apparatus. The N-terminal region contained a sequence motif for retention of AtPRA1.B6 at the Golgi apparatus. In addition, anterograde trafficking of AtPRA1.B6 from the ER to the Golgi apparatus was highly sensitive to the HA:AtPRA1.B6 level. The region that contains the sequence motif for Golgi retention also conferred the abundance-dependent trafficking inhibition. On the basis of these results, we propose that AtPRA1.B6 localizes to the Golgi apparatus and its ER-to-Golgi trafficking and localization to the Golgi apparatus are regulated by multiple sequence motifs in both the C- and N-terminal cytoplasmic domains.

  3. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  4. ELM 2016—data update and new functionality of the eukaryotic linear motif resource

    Science.gov (United States)

    Dinkel, Holger; Van Roey, Kim; Michael, Sushama; Kumar, Manjeet; Uyar, Bora; Altenberg, Brigitte; Milchevskaya, Vladislava; Schneider, Melanie; Kühn, Helen; Behrendt, Annika; Dahl, Sophie Luise; Damerell, Victoria; Diebel, Sandra; Kalman, Sara; Klein, Steffen; Knudsen, Arne C.; Mäder, Christina; Merrill, Sabina; Staudt, Angelina; Thiel, Vera; Welti, Lukas; Davey, Norman E.; Diella, Francesca; Gibson, Toby J.

    2016-01-01

    The Eukaryotic Linear Motif (ELM) resource (http://elm.eu.org) is a manually curated database of short linear motifs (SLiMs). In this update, we present the latest additions to this resource, along with more improvements to the web interface. ELM 2016 contains more than 240 different motif classes with over 2700 experimentally validated instances, manually curated from more than 2400 scientific publications. In addition, more data have been made available as individually searchable pages and are downloadable in various formats. PMID:26615199

  5. Discovery of a functional immunoreceptor tyrosine-based switch motif in a 7-transmembrane-spanning receptor: role in the orexin receptor OX1R-driven apoptosis.

    Science.gov (United States)

    El Firar, Aadil; Voisin, Thierry; Rouyer-Fessard, Christiane; Ostuni, Mariano A; Couvineau, Alain; Laburthe, Marc

    2009-12-01

    The orexin neuropeptides promote robust apoptosis in cancer cells. We have recently shown that the 7-transmembrane-spanning orexin receptor OX1R mediates apoptosis through an original mechanism. OX1R is equipped with a tyrosine-based inhibitory motif ITIM, which is tyrosine-phosphorylated on receptor activation, allowing the recruitment and activation of the tyrosine phosphatase SHP-2, leading to apoptosis. We show here that another motif, immunoreceptor tyrosine-based switch motif (ITSM), is present in OX1R and is mandatory for OX1R-mediated apoptosis. This conclusion is based on the following observations: 1) a canonical ITSM sequence is present in the first intracellular loop of OX1R; 2) mutation of Y(83) to F within ITSM abolished OX1R-mediated apoptosis but did not alter orexin-induced inositol phosphate formation or calcium transient via coupling of OX1R to G(q) protein; 3) mutation of Y(83) to F further abolished orexin-induced tyrosine phosphorylation in ITSM and subsequent recruitment of SHP-2 by the receptor. Finally, we developed a structural model of OX1R showing that the spatial localization of phosphotyrosines in ITSM and ITIM in OX1R is compatible with their interaction with the two SH2 domains of SHP-2. These data represent the first evidence for a functional role of an ITSM in a 7-transmembrane-spanning receptor.

  6. Efficient α, β-motif finder for identification of phenotype-related functional modules

    Directory of Open Access Journals (Sweden)

    Schmidt Matthew C

    2011-11-01

    Full Text Available Abstract Background Microbial communities in their natural environments exhibit phenotypes that can directly cause particular diseases, convert biomass or wastewater to energy, or degrade various environmental contaminants. Understanding how these communities realize specific phenotypic traits (e.g., carbon fixation, hydrogen production is critical for addressing health, bioremediation, or bioenergy problems. Results In this paper, we describe a graph-theoretical method for in silico prediction of the cellular subsystems that are related to the expression of a target phenotype. The proposed (α, β-motif finder approach allows for identification of these phenotype-related subsystems that, in addition to metabolic subsystems, could include their regulators, sensors, transporters, and even uncharacterized proteins. By comparing dozens of genome-scale networks of functionally associated proteins, our method efficiently identifies those statistically significant functional modules that are in at least α networks of phenotype-expressing organisms but appear in no more than β networks of organisms that do not exhibit the target phenotype. It has been shown via various experiments that the enumerated modules are indeed related to phenotype-expression when tested with different target phenotypes like hydrogen production, motility, aerobic respiration, and acid-tolerance. Conclusion Thus, we have proposed a methodology that can identify potential statistically significant phenotype-related functional modules. The functional module is modeled as an (α, β-clique, where α and β are two criteria introduced in this work. We also propose a novel network model, called the two-typed, divided network. The new network model and the criteria make the problem tractable even while very large networks are being compared. The code can be downloaded from http://www.freescience.org/cs/ABClique/

  7. Functional annotation from the genome sequence of the giant panda.

    Science.gov (United States)

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-08-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

  8. Cloning, Sequence Analysis and Expression Patterns during Seed Germination of a Rapeseed (Brassica napus L. G-x-S-x-G-motif Lipase Gene

    Directory of Open Access Journals (Sweden)

    Imen GLAIED GHRAM

    2016-12-01

    Full Text Available Lipases catalyze the hydrolysis of ester bonds in triacylglycerides, generating glycerol and free fatty acids. These enzymes are encoded by extremely complex gene families, and appear to fulfil many different biological functions. Although they are present in all types of organisms, available information on plant lipases is still very limited, as compared to their bacterial and animal counterparts. A full-length clone, BnLIP, encoding a putative lipase, has been isolated by PCR amplification of Brassica napus genomic DNA, with oligonucleotide primers derived from the sequence of an Arabidopsis thaliana homologue. The clone included an open reading frame of 1581 bp encoding a polypeptide of 526 amino acids, with a calculated molecular mass of 59.5 kDa. Analysis of the deduced protein sequence, sequence alignment with homologous proteins from related plant species, and a phylogenetic analysis revealed that the BnLIP protein belongs to the ‘classical’ GxSxG-motif lipase family. RT-PCR assays indicated that the BnLIP gene is expressed specifically, but only transiently, during seed germination: the lipase mRNA was not present at detectable levels in ungerminated seeds, was detected only three days after seed imbibition, but its levels decreased rapidly afterwards. No expression was observed in roots, stems or leaves of adult plants. This expression pattern suggests that BnLIP is one of the lipases involved in the hydrolysis of triacylglycerides stored in rapeseed seeds, ultimately providing nutrients and energy to sustain seedling growth until photosynthesis is activated.

  9. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  10. Engineering multiple biological functional motifs into a blank collagen-like protein template from Streptococcus pyogenes.

    Science.gov (United States)

    Peng, Yong Y; Stoichevska, Violet; Schacht, Kristin; Werkmeister, Jerome A; Ramshaw, John A M

    2014-07-01

    Bacterially derived triple-helical, collagen-like proteins are attractive as potential biomedical materials. The collagen-like domain of the Scl2 protein from S. pyogenes lacks any specific binding sites for mammalian cells yet possesses the inherent structural integrity of the collagen triple-helix of animal collagens. It can, therefore, be considered as a structurally-stable "blank slate" into which various defined, biological sequences, derived from animal collagens, can be added by substitutions or insertions, to enable production of novel designed materials to fit specific functional requirements. In the present study, we have used site directed mutagenesis to substitute two functional sequences, one for heparin binding and the other for integrin binding, into different locations in the triple-helical structure. This provided three new constructs, two containing the single substitutions and one containing both substitutions. The stability of these constructs was marginally reduced when compared to the unmodified sequence. When compared to the unmodified bacterial collagen, both the modified collagens that contain the heparin binding site showed marked binding of fluorescently labeled heparin. Similarly, the modified collagens from both constructs containing the integrin binding site showed significant adhesion of L929 cells that are known to possess the appropriate integrin receptor. C2C12 cells that lack any appropriate integrins did not bind. These data show that bacterial collagen-like sequences can be modified to act like natural extracellular matrix collagens by inserting one or more unique biological domains with defined function.

  11. A common sequence motif determines the Cajal body-specific localization of box H/ACA scaRNAs.

    Science.gov (United States)

    Richard, Patricia; Darzacq, Xavier; Bertrand, Edouard; Jády, Beáta E; Verheggen, Céline; Kiss, Tamás

    2003-08-15

    Post-transcriptional synthesis of 2'-O-methylated nucleotides and pseudouridines in Sm spliceosomal small nuclear RNAs takes place in the nucleoplasmic Cajal bodies and it is directed by guide RNAs (scaRNAs) that are structurally and functionally indistinguishable from small nucleolar RNAs (snoRNAs) directing rRNA modification in the nucleolus. The scaRNAs are synthesized in the nucleoplasm and specifically targeted to Cajal bodies. Here, mutational analysis of the human U85 box C/D-H/ACA scaRNA, followed by in situ localization, demonstrates that box H/ACA scaRNAs share a common Cajal body-specific localization signal, the CAB box. Two copies of the evolutionarily conserved CAB consensus (UGAG) are located in the terminal loops of the 5' and 3' hairpins of the box H/ACA domains of mammalian, Drosophila and plant scaRNAs. Upon alteration of the CAB boxes, mutant scaRNAs accumulate in the nucleolus. In turn, authentic snoRNAs can be targeted into Cajal bodies by addition of exogenous CAB box motifs. Our results indicate that scaRNAs represent an ancient group of small nuclear RNAs which are localized to Cajal bodies by an evolutionarily conserved mechanism.

  12. Limb body wall complex, amniotic band sequence, or new syndrome caused by mutation in IQ Motif containing K (IQCK)?

    Science.gov (United States)

    Kruszka, Paul; Uwineza, Annette; Mutesa, Leon; Martinez, Ariel F; Abe, Yu; Zackai, Elaine H; Ganetzky, Rebecca; Chung, Brian; Stevenson, Roger E; Adelstein, Robert S; Ma, Xuefei; Mullikin, James C; Hong, Sung-Kook; Muenke, Maximilian

    2015-01-01

    Limb body wall complex (LBWC) and amniotic band sequence (ABS) are multiple congenital anomaly conditions with craniofacial, limb, and ventral wall defects. LBWC and ABS are considered separate entities by some, and a continuum of severity of the same condition by others. The etiology of LBWC/ABS remains unknown and multiple hypotheses have been proposed. One individual with features of LBWC and his unaffected parents were whole exome sequenced and Sanger sequenced as confirmation of the mutation. Functional studies were conducted using morpholino knockdown studies followed by human mRNA rescue experiments. Using whole exome sequencing, a de novo heterozygous mutation was found in the gene IQCK: c.667C>G; p.Q223E and confirmed by Sanger sequencing in an individual with LBWC. Morpholino knockdown of iqck mRNA in the zebrafish showed ventral defects including failure of ventral fin to develop and cardiac edema. Human wild-type IQCK mRNA rescued the zebrafish phenotype, whereas human p.Q223E IQCK mRNA did not, but worsened the phenotype of the morpholino knockdown zebrafish. This study supports a genetic etiology for LBWC/ABS, or potentially a new syndrome. PMID:26436108

  13. Predicting Contextual Sequences via Submodular Function Maximization

    CERN Document Server

    Dey, Debadeepta; Hebert, Martial; Bagnell, J Andrew

    2012-01-01

    Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reduction-based approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each "slot" in the sequence. Our approach leverages recent work on submodular function maximization to provide a formal regret reduction from submodular sequence optimization to simple cost-sensitive prediction. We apply our contextual sequence prediction algorithm to optimize control libraries and demonstrate results on two robotics problems: ...

  14. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  15. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active...... sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally...... valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects...

  16. Gamma and Related Functions Generalized for Sequences

    Science.gov (United States)

    Ollerton, R. L.

    2008-01-01

    Given a sequence g[subscript k] greater than 0, the "g-factorial" product [big product][superscript k] [subscript i=1] g[subscript i] is extended from integer k to real x by generalizing properties of the gamma function [Gamma](x). The Euler-Mascheroni constant [gamma] and the beta and zeta functions are also generalized. Specific examples include…

  17. Fibonacci difference sequence spaces for modulus functions

    Directory of Open Access Journals (Sweden)

    Kuldip Raj

    2015-05-01

    Full Text Available In the present paper we introduce Fibonacci difference sequence spaces l(F, Ƒ, p, u and  l_∞(F, Ƒ, p, u by using a sequence of modulus functions and a new band matrix F. We also make an effort to study some inclusion relations, topological and geometric properties of these spaces. Furthermore, the alpha, beta, gamma duals and matrix transformation of the space l(F, Ƒ, p, u are determined.

  18. Microbial genomics: from sequence to function.

    OpenAIRE

    Schwartz, I

    2000-01-01

    The era of genomics (the study of genes and their function) began a scant dozen years ago with a suggestion by James Watson that the complete DNA sequence of the human genome be determined. Since that time, the human genome project has attracted a great deal of attention in the scientific world and the general media; the scope of the sequencing effort, and the extraordinary value that it will provide, has served to mask the enormous progress in sequencing other genomes. Microbial genome seque...

  19. The N-Terminal GYPSY Motif Is Required for Pilin-Specific Sortase SrtC1 Functionality in Lactobacillus rhamnosus Strain GG

    Science.gov (United States)

    Douillard, François P.; Rasinkangas, Pia; Bhattacharjee, Arnab; Palva, Airi; de Vos, Willem M.

    2016-01-01

    Predominantly identified in pathogenic Gram-positive bacteria, sortase-dependent pili are also found in commensal species, such as the probiotic-marketed strain Lactobacillus rhamnosus strain GG. Pili are typically associated with host colonization, immune signalling and biofilm formation. Comparative analysis of the N-terminal domains of pilin-specific sortases from various piliated Gram-positive bacteria identified a conserved motif, called GYPSY, within the signal sequence. We investigated the function and role of the GYPSY residues by directed mutagenesis in homologous (rod-shaped) and heterologous (coccoid-shaped) expression systems for pilus formation. Substitutions of some of the GYPSY residues, and more specifically the proline residue, were found to have a direct impact on the degree of piliation of Lb. rhamnosus GG. The present findings uncover a new signalling element involved in the functionality of pilin-specific sortases controlling the pilus biogenesis of Lb. rhamnosus GG and related piliated Gram-positive species. PMID:27070897

  20. The N-Terminal GYPSY Motif Is Required for Pilin-Specific Sortase SrtC1 Functionality in Lactobacillus rhamnosus Strain GG.

    Science.gov (United States)

    Douillard, François P; Rasinkangas, Pia; Bhattacharjee, Arnab; Palva, Airi; de Vos, Willem M

    2016-01-01

    Predominantly identified in pathogenic Gram-positive bacteria, sortase-dependent pili are also found in commensal species, such as the probiotic-marketed strain Lactobacillus rhamnosus strain GG. Pili are typically associated with host colonization, immune signalling and biofilm formation. Comparative analysis of the N-terminal domains of pilin-specific sortases from various piliated Gram-positive bacteria identified a conserved motif, called GYPSY, within the signal sequence. We investigated the function and role of the GYPSY residues by directed mutagenesis in homologous (rod-shaped) and heterologous (coccoid-shaped) expression systems for pilus formation. Substitutions of some of the GYPSY residues, and more specifically the proline residue, were found to have a direct impact on the degree of piliation of Lb. rhamnosus GG. The present findings uncover a new signalling element involved in the functionality of pilin-specific sortases controlling the pilus biogenesis of Lb. rhamnosus GG and related piliated Gram-positive species.

  1. Motif content comparison between monocot and dicot species

    Directory of Open Access Journals (Sweden)

    Matyas Cserhati

    2015-03-01

    Full Text Available While a number of DNA sequence motifs have been functionally characterized, the full repertoire of motifs in an organism (the motifome is yet to be characterized. The present study wishes to widen the scope of motif content analysis in different monocot and dicot species that include both rice species, Brachypodium, corn, wheat as monocots and Arabidopsis, Lotus japonica, Medicago truncatula, and Populus tremula as dicots. All possible existing motifs were analyzed in different regions of genomes such as were found in different sets of sequences in these species: the whole genome, core proximal and distal promoters, 5′ and 3′ UTRs, and the 1st introns. Due to the increased number of species involved in this study compared to previous works, species relationships were analyzed based on the similarity of common motif content. Certain secondary structure elements were inferred in the genomes of these species as well as new unknown motifs. The distribution of 20 motifs common to the studied species were found to have a significantly larger occurrence within the promoters and 3′ UTRs of genes, both being regulatory regions. Motifs common to the promoter regions of japonica rice, Brachypodium, and corn were also found in a number of orthologous and paralogous genes. Some of our motifs were found to be complementary to miRNA elements in Brachypodium distachyon and japonica rice.

  2. The NS1 polypeptide of the murine parvovirus minute virus of mice binds to DNA sequences containing the motif [ACCA]2-3.

    Science.gov (United States)

    Cotmore, S F; Christensen, J; Nüesch, J P; Tattersall, P

    1995-03-01

    A DNA fragment containing the minute virus of mice 3' replication origin was specifically coprecipitated in immune complexes containing the virally coded NS1, but not the NS2, polypeptide. Antibodies directed against the amino- or carboxy-terminal regions of NS1 precipitated the NS1-origin complexes, but antibodies directed against NS1 amino acids 284 to 459 blocked complex formation. Using affinity-purified histidine-tagged NS1 preparations, we have shown that the specific protein-DNA interaction is of moderate affinity, being stable in 0.1 M salt but rapidly lost at higher salt concentrations. In contrast, generalized (or nonspecific) DNA binding by NS1 could be demonstrated only in low salt. Addition of ATP or gamma S-ATP enhanced specific DNA binding by wild-type NS1 severalfold, but binding was lost under conditions which favored ATP hydrolysis. NS1 molecules with mutations in a critical lysine residue (amino acid 405) in the consensus ATP-binding site bound to the origin, but this binding could not be enhanced by ATP addition. DNase I protection assays carried out with wild-type NS1 in the presence of gamma S-ATP gave footprints which extended over 43 nucleotides on both DNA strands, from the middle of the origin bubble sequence to a position some 14 bp beyond the nick site. The DNA-binding site for NS1 was mapped to a 22-bp fragment from the middle of the 3' replication origin which contains the sequence ACCAACCA. This conforms to a reiterated motif (ACCA)2-3, which occurs, in more or less degenerate form, at many sites throughout the minute virus of mice genome (J. W. Bodner, Virus Genes 2:167-182, 1989). Insertion of a single copy of the sequence (ACCA)3 was shown to be sufficient to confer NS1 binding on an otherwise unrecognized plasmid fragment. The functions of NS1 in the viral life cycle are reevaluated in the light of this result.

  3. Hitchcock's Motifs

    NARCIS (Netherlands)

    Walker, Michael

    2005-01-01

    Among the abundant Alfred Hitchcock literature, Hitchcock's Motifs has found a fresh angle. Starting from recurring objects, settings, character-types and events, Michael Walker tracks some forty motifs, themes and clusters across the whole of Hitchcock's oeuvre, including not only all his 52 extant

  4. Completely irreducible sequences of meromorphic functions

    Institute of Scientific and Technical Information of China (English)

    NEVO; Shahar; ZALCMAN; Lawrence

    2010-01-01

    Given a sequence {fn } of meromorphic functions on a plane domain D, there exists a (possibly empty) open set U■D and a subsequence {f n k } which converges uniformly (with respect to the spherical metric on ) on compact subsets of U, while no subsequence of {f n k } converges uniformly on compact subsets of any larger open subset of D.

  5. Detecting functional divergence after gene duplication through evolutionary changes in posttranslational regulatory sequences.

    Science.gov (United States)

    Nguyen Ba, Alex N; Strome, Bob; Hua, Jun Jie; Desmond, Jonathan; Gagnon-Arsenault, Isabelle; Weiss, Eric L; Landry, Christian R; Moses, Alan M

    2014-12-01

    Gene duplication is an important evolutionary mechanism that can result in functional divergence in paralogs due to neo-functionalization or sub-functionalization. Consistent with functional divergence after gene duplication, recent studies have shown accelerated evolution in retained paralogs. However, little is known in general about the impact of this accelerated evolution on the molecular functions of retained paralogs. For example, do new functions typically involve changes in enzymatic activities, or changes in protein regulation? Here we study the evolution of posttranslational regulation by examining the evolution of important regulatory sequences (short linear motifs) in retained duplicates created by the whole-genome duplication in budding yeast. To do so, we identified short linear motifs whose evolutionary constraint has relaxed after gene duplication with a likelihood-ratio test that can account for heterogeneity in the evolutionary process by using a non-central chi-squared null distribution. We find that short linear motifs are more likely to show changes in evolutionary constraints in retained duplicates compared to single-copy genes. We examine changes in constraints on known regulatory sequences and show that for the Rck1/Rck2, Fkh1/Fkh2, Ace2/Swi5 paralogs, they are associated with previously characterized differences in posttranslational regulation. Finally, we experimentally confirm our prediction that for the Ace2/Swi5 paralogs, Cbk1 regulated localization was lost along the lineage leading to SWI5 after gene duplication. Our analysis suggests that changes in posttranslational regulation mediated by short regulatory motifs systematically contribute to functional divergence after gene duplication.

  6. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

    DEFF Research Database (Denmark)

    Foulk, M. S.; Urban, J. M.; Casella, Cinzia;

    2015-01-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (lambda-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent...... are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na+ instead of K...

  7. MINER: software for phylogenetic motif identification

    OpenAIRE

    La, David; Livesay, Dennis R.

    2005-01-01

    MINER is web-based software for phylogenetic motif (PM) identification. PMs are sequence regions (fragments) that conserve the overall familial phylogeny. PMs have been shown to correspond to a wide variety of catalytic regions, substrate-binding sites and protein interfaces, making them ideal functional site predictions. The MINER output provides an intuitive interface for interactive PM sequence analysis and structural visualization. The web implementation of MINER is freely available at . ...

  8. Sequence and structural analysis of the Asp-box motif and Asp-box beta-propellers; a widespread propeller-type characteristic of the Vps10 domain family and several glycoside hydrolase families

    Directory of Open Access Journals (Sweden)

    Quistgaard Esben M

    2009-07-01

    Full Text Available Abstract Background The Asp-box is a short sequence and structure motif that folds as a well-defined β-hairpin. It is present in different folds, but occurs most prominently as repeats in β-propellers. Asp-box β-propellers are known to be characteristically irregular and to occur in many medically important proteins, most of which are glycosidase enzymes, but they are otherwise not well characterized and are only rarely treated as a distinct β-propeller family. We have analyzed the sequence, structure, function and occurrence of the Asp-box and s-Asp-box -a related shorter variant, and provide a comprehensive classification and computational analysis of the Asp-box β-propeller family. Results We find that all conserved residues of the Asp-box support its structure, whereas the residues in variable positions are generally used for other purposes. The Asp-box clearly has a structural role in β-propellers and is highly unlikely to be involved in ligand binding. Sequence analysis of the Asp-box β-propeller family reveals it to be very widespread especially in bacteria and suggests a wide functional range. Disregarding the Asp-boxes, sequence conservation of the propeller blades is very low, but a distinct pattern of residues with specific properties have been identified. Interestingly, Asp-boxes are occasionally found very close to other propeller-associated repeats in extensive mixed-motif stretches, which strongly suggests the existence of a novel class of hybrid β-propellers. Structural analysis reveals that the top and bottom faces of Asp-box β-propellers have striking and consistently different loop properties; the bottom is structurally conserved whereas the top shows great structural variation. Interestingly, only the top face is used for functional purposes in known structures. A structural analysis of the 10-bladed β-propeller fold, which has so far only been observed in the Asp-box family, reveals that the inner strands of the

  9. EXTREME: an online EM algorithm for motif discovery

    Science.gov (United States)

    Quang, Daniel; Xie, Xiaohui

    2014-01-01

    Motivation: Identifying regulatory elements is a fundamental problem in the field of gene transcription. Motif discovery—the task of identifying the sequence preference of transcription factor proteins, which bind to these elements—is an important step in this challenge. MEME is a popular motif discovery algorithm. Unfortunately, MEME’s running time scales poorly with the size of the dataset. Experiments such as ChIP-Seq and DNase-Seq are providing a rich amount of information on the binding preference of transcription factors. MEME cannot discover motifs in data from these experiments in a practical amount of time without a compromising strategy such as discarding a majority of the sequences. Results: We present EXTREME, a motif discovery algorithm designed to find DNA-binding motifs in ChIP-Seq and DNase-Seq data. Unlike MEME, which uses the expectation-maximization algorithm for motif discovery, EXTREME uses the online expectation-maximization algorithm to discover motifs. EXTREME can discover motifs in large datasets in a practical amount of time without discarding any sequences. Using EXTREME on ChIP-Seq and DNase-Seq data, we discover many motifs, including some novel and infrequent motifs that can only be discovered by using the entire dataset. Conservation analysis of one of these novel infrequent motifs confirms that it is evolutionarily conserved and possibly functional. Availability and implementation: All source code is available at the Github repository http://github.com/uci-cbcl/EXTREME. Contact: xhx@ics.uci.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24532725

  10. Evolution of the ferric reductase domain (FRD) superfamily: modularity, functional diversification, and signature motifs.

    Science.gov (United States)

    Zhang, Xuezhi; Krause, Karl-Heinz; Xenarios, Ioannis; Soldati, Thierry; Boeckmann, Brigitte

    2013-01-01

    A heme-containing transmembrane ferric reductase domain (FRD) is found in bacterial and eukaryotic protein families, including ferric reductases (FRE), and NADPH oxidases (NOX). The aim of this study was to understand the phylogeny of the FRD superfamily. Bacteria contain FRD proteins consisting only of the ferric reductase domain, such as YedZ and short bFRE proteins. Full length FRE and NOX enzymes are mostly found in eukaryotic cells and all possess a dehydrogenase domain, allowing them to catalyze electron transfer from cytosolic NADPH to extracellular metal ions (FRE) or oxygen (NOX). Metazoa possess YedZ-related STEAP proteins, possibly derived from bacteria through horizontal gene transfer. Phylogenetic analyses suggests that FRE enzymes appeared early in evolution, followed by a transition towards EF-hand containing NOX enzymes (NOX5- and DUOX-like). An ancestral gene of the NOX(1-4) family probably lost the EF-hands and new regulatory mechanisms of increasing complexity evolved in this clade. Two signature motifs were identified: NOX enzymes are distinguished from FRE enzymes through a four amino acid motif spanning from transmembrane domain 3 (TM3) to TM4, and YedZ/STEAP proteins are identified by the replacement of the first canonical heme-spanning histidine by a highly conserved arginine. The FRD superfamily most likely originated in bacteria.

  11. Analysis of sequences involved in IE2 transactivation of a baculovirus immediate-early gene promoter and identification of a new regulatory motif.

    Science.gov (United States)

    Shippam-Brett, C E; Willis, L G; Theilmann, D A

    2001-05-01

    Opep-2 is a unique baculovirus early gene that has only been identified in the Orgyia pseudotsugata multiple capsid nucleopolyhedrovirus (OpMNPV). Previous analyses have shown this gene is expressed at very early times post-infection (p.i.) but is shut down by 36-48 h p.i. The promoter of opep-2 therefore, represents a class of early genes that is temporally regulated. In this study, a detailed analysis of the opep-2 promoter is performed to analyze the role individual motifs play in early gene expression. A new 13 base pair regulatory element was identified and shown to be essential in controlling high-level expression of this gene. In addition, mutational analysis revealed that GATA and CACGTG motifs, which have been shown to bind cellular factors in Sf9 and Ld652Y cells, played minor roles in influencing opep-2 expression in the absence of other viral factors. The OpMNPV transactivator IE2 causes a significant activation of the opep-2 promoter. Cotransfection of an extensive number of promoter deletions and mutations did not show any sequence specificity for IE2 transactivation. This is the first detailed analysis of the sequence requirements for IE2 transactivation, and these results suggest that IE2 does not bind directly to specific elements in the opep-2 promoter.

  12. Associations of homologous RNA-binding motif gene on the X chromosome (RBMX) and its like sequence on chromosome 9(RBMXL9) with non-obstructive azoospermia

    Institute of Scientific and Technical Information of China (English)

    Akira Tsujimura; Masao Ota; Akihiko Okuyama; Kazutoshi Fujita; Kazuhiko Komori; Phanu Tanjapatkul; Yasushi Miyagawa; Shingo Takada; Kiyomi Matsumiya; Masaharu Sada; Yoshihiko Katsuyama

    2006-01-01

    Aim: To investigate the associations of autosomal and X-chromosome homologs of the RNA-binding-motif (RNA-binding-motif on the Y chromosome, RBMY) gene with non-obstructive azoospermia (NOA), as genetic factors for NOA may map to chromosomes other than the Y chromosome. Methods: Genomic DNA was extracted using a salting-out procedure after treatment of peripheral blood leukocytes with proteinase K from Japanese patients with NOA (n = 67) and normal fertile volunteers (n = 105). The DNA were analyzed for RBMX by expressed sequence tag (EST) deletion and for the like sequence on chromosome 9 (RBMXL9) by microsatellite polymorphism. Results: We examined six ESTs in and around RBMX and found a deletion of SHGC31764 in one patient with NOA and a deletion of DXS7491 in one other patient with NOA. No deletions were detected in control subjects. The association study with nine microsatellite markers near RBMXL9 revealed that D9S319 was less prevalent in patients than in control subjects, whereas D9S1853 was detected more frequently in patients than that in control subjects. Conclusion: We provide evidence that deletions in or around RBMX may be involved in NOA. In addition, analyses of markers in the vicinity of RBMXL9 on chromosome 9 suggest the possibility that variants of this gene may be associated with NOA.Although further studies are necessary, this is the first report of the association between RBMX and RBMXL9 with NOA.

  13. Complex Predicates and the Functional Sequence

    Directory of Open Access Journals (Sweden)

    Peter Svenonius

    2008-12-01

    Full Text Available In this paper I argue that a fine-grained functional hierarchy of semantically contentful categories such as Tense, Aspect, Initiation, and Process has explanatory power in understanding the crosslinguistic distribution of complex predicates. Complex predicates may involve adjunction, control, or raising, and show other variables as well. In a Minimalist framework, specific parameters cannot be invoked to allow or disallow different kinds of serial verbs, light verbs, resultatives, and so on. Instead, what variation is observed must come from the specifications of lexical items. This places a great burden on the learner, a burden which, I argue, is partly alleviated by the functional sequence.

  14. Diversifying microRNA sequence and function.

    Science.gov (United States)

    Ameres, Stefan L; Zamore, Phillip D

    2013-08-01

    MicroRNAs (miRNAs) regulate the expression of most genes in animals, but we are only now beginning to understand how they are generated, assembled into functional complexes and destroyed. Various mechanisms have now been identified that regulate miRNA stability and that diversify miRNA sequences to create distinct isoforms. The production of different isoforms of individual miRNAs in specific cells and tissues may have broader implications for miRNA-mediated gene expression control. Rigorously testing the many discrepant models for how miRNAs function using quantitative biochemical measurements made in vivo and in vitro remains a major challenge for the future.

  15. The κB transcriptional enhancer motif and signal sequences of V(DJ recombination are targets for the zinc finger protein HIVEP3/KRC: a site selection amplification binding study

    Directory of Open Access Journals (Sweden)

    Wu Lai-Chu

    2002-08-01

    Full Text Available Abstract Background The ZAS family is composed of proteins that regulate transcription via specific gene regulatory elements. The amino-DNA binding domain (ZAS-N and the carboxyl-DNA binding domain (ZAS-C of a representative family member, named κB DNA binding and recognition component (KRC, were expressed as fusion proteins and their target DNA sequences were elucidated by site selection amplification binding assays, followed by cloning and DNA sequencing. The fusion proteins-selected DNA sequences were analyzed by the MEME and MAST computer programs to obtain consensus motifs and DNA elements bound by the ZAS domains. Results Both fusion proteins selected sequences that were similar to the κB motif or the canonical elements of the V(DJ recombination signal sequences (RSS from a pool of degenerate oligonucleotides. Specifically, the ZAS-N domain selected sequences similar to the canonical RSS nonamer, while ZAS-C domain selected sequences similar to the canonical RSS heptamer. In addition, both KRC fusion proteins selected oligonucleoties with sequences identical to heptamer and nonamer sequences within endogenous RSS. Conclusions The RSS are cis-acting DNA motifs which are essential for V(DJ recombination of antigen receptor genes. Due to its specific binding affinity for RSS and κB-like transcription enhancer motifs, we hypothesize that KRC may be involved in the regulation of V(DJ recombination.

  16. Functional divergence of APETALA1 and FRUITFULL is due to changes in both regulation and coding sequence

    Directory of Open Access Journals (Sweden)

    Elizabeth W. McCarthy

    2015-12-01

    Full Text Available Gene duplications are prevalent in plants, and functional divergence subsequent to duplication may be linked with the occurrence of novel phenotypes in plant evolution. Here, we examine the functional divergence of Arabidopsis thaliana APETALA1 (AP1 and FRUITFULL (FUL, which arose via a duplication correlated with the origin of the core eudicots. Both AP1 and FUL play a role in floral meristem identity, but AP1 is required for the formation of sepals and petals whereas FUL is involved in cauline leaf and fruit development. AP1 and FUL are expressed in mutually exclusive domains but also differ in sequence, with unique conserved motifs in the C-terminal domains of the proteins that suggest functional differentiation. To determine whether the functional divergence of AP1 and FUL is due to changes in regulation or changes in coding sequence, we performed promoter swap experiments, in which FUL was expressed in the AP1 domain in the ap1 mutant and vice versa. Our results show that FUL can partially substitute for AP1, and AP1 can partially substitute for FUL; thus, the functional divergence between AP1 and FUL is due to changes in both regulation and coding sequence. We also mutated AP1 and FUL conserved motifs to determine if they are required for protein function and tested the ability of these mutated proteins to interact in yeast with known partners. We found that these motifs appear to play at best a minor role in protein function and dimerization capability, despite being strongly conserved. Our results suggest that the functional differentiation of these two paralogous key transcriptional regulators involves both differences in regulation and in sequence; however, sequence changes in the form of unique conserved motifs do not explain the differences observed.

  17. NestedMICA as an ab initio protein motif discovery tool

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2008-01-01

    Full Text Available Abstract Background Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length. Results Generally NestedMICA recovered most of the short (3–9 amino acid long test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME. Conclusion NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences. Availability NestedMICA is available under the Lesser GPL open-source license from: http://www.sanger.ac.uk/Software/analysis/nmica/

  18. Sequence and structure-based prediction of fructosyltransferase activity for functional subclassification of fungal GH32 enzymes.

    Science.gov (United States)

    Trollope, Kim M; van Wyk, Niël; Kotjomela, Momo A; Volschenk, Heinrich

    2015-12-01

    Sucrolytic enzymes catalyse sucrose hydrolysis or the synthesis of fructooligosaccharides (FOSs), a prebiotic in human and animal nutrition. FOS synthesis capacity differs between sucrolytic enzymes. Amino-acid-sequence-based classification of FOS synthesizing enzymes would greatly facilitate the in silico identification of novel catalysts, as large amounts of sequence data lie untapped. The development of a bioinformatics tool to rapidly distinguish between high-level FOSs synthesizing predominantly sucrose hydrolysing enzymes from fungal genomic data is presented. Sequence comparison of functionally characterized enzymes displaying low- and high-level FOS synthesis revealed conserved motifs unique to each group. New light is shed on the sequence context of active site residues in three previously identified conserved motifs. We characterized two enzymes predicted to possess low- and high-level FOS synthesis activities based on their conserved motif sequences. FOS data for the enzymes confirmed our successful prediction of their FOS synthesis capacity. Structural comparison of enzymes displaying low- and high-level FOS synthesis identified steric hindrance between nystose and a long loop region present only in low-level FOS synthesizers. This loop is proposed to limit the synthesis of FOS species with higher degrees of polymerization, a phenomenon observed among enzymes displaying low-level FOS synthesis. Conserved sequence motifs surrounding catalytic residues and a distant structural determinant were identifiers of FOS synthesis capacity and allow for functional annotation of sucrolytic enzymes directly from amino acid sequence. The tool presented may also be useful to study the structure-function relationships of β-fructofuranosidases by identifying mutations present in a group of closely related enzymes displaying similar function.

  19. Universal structure motifs in biominerals: a lesson from nature for the efficient design of bioinspired functional materials.

    Science.gov (United States)

    Harris, Joe; Böhm, Corinna F; Wolf, Stephan E

    2017-08-06

    Biominerals are typically indispensable structures for their host organism in which they serve varying functions, such as mechanical support and protection, mineral storage, detoxification site, or as a sensor or optical guide. In this perspective article, we highlight the occurrence of both structural diversity and uniformity within these biogenic ceramics. For the first time, we demonstrate that the universality-diversity paradigm, which was initially introduced for proteins by Buehler et al. (Cranford & Buehler 2012 Biomateriomics; Cranford et al. 2013 Adv. Mater.25, 802-824 (doi:10.1002/adma.201202553); Ackbarow & Buehler 2008 J. Comput. Theor. Nanosci.5, 1193-1204 (doi:10.1166/jctn.2008.001); Buehler & Yung 2009 Nat. Mater.8, 175-188 (doi:10.1038/nmat2387)), is also valid in the realm of biomineralization. A nanogranular composite structure is shared by most biominerals which rests on a common, non-classical crystal growth mechanism. The nanogranular composite structure affects various properties of the macroscale biogenic ceramic, a phenomenon we attribute to emergence. Emergence, in turn, is typical for hierarchically organized materials. This is a clear call to renew comparative studies of even distantly related biomineralizing organisms to identify further universal design motifs and their associated emergent properties. Such universal motifs with emergent macro-scale properties may represent an unparalleled toolbox for the efficient design of bioinspired functional materials.

  20. Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases

    Directory of Open Access Journals (Sweden)

    Braun Werner

    2002-11-01

    Full Text Available Abstract Background Total sequence decomposition, using the web-based MASIA tool, identifies areas of conservation in aligned protein sequences. By structurally annotating these motifs, the sequence can be parsed into individual building blocks, molecular legos ("molegos", that can eventually be related to function. Here, the approach is applied to the apurinic/apyrimidinic endonuclease (APE DNA repair proteins, essential enzymes that have been highly conserved throughout evolution. The APEs, DNase-1 and inositol 5'-polyphosphate phosphatases (IPP form a superfamily that catalyze metal ion based phosphorolysis, but recognize different substrates. Results MASIA decomposition of APE yielded 12 sequence motifs, 10 of which are also structurally conserved within the family and are designated as molegos. The 12 motifs include all the residues known to be essential for DNA cleavage by APE. Five of these molegos are sequentially and structurally conserved in DNase-1 and the IPP family. Correcting the sequence alignment to match the residues at the ends of two of the molegos that are absolutely conserved in each of the three families greatly improved the local structural alignment of APEs, DNase-1 and synaptojanin. Comparing substrate/product binding of molegos common to DNase-1 showed that those distinctive for APEs are not directly involved in cleavage, but establish protein-DNA interactions 3' to the abasic site. These additional bonds enhance both specific binding to damaged DNA and the processivity of APE1. Conclusion A modular approach can improve structurally predictive alignments of homologous proteins with low sequence identity and reveal residues peripheral to the traditional "active site" that control the specificity of enzymatic activity.

  1. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    . Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif...... viewer, that allows the display of the likely binding motif for all human class I proteins of the loci HLA A, B, C, and E and for MHC class I molecules from chimpanzee (Pan troglodytes), rhesus monkey (Macaca mulatta), and mouse (Mus musculus). Furthermore, it covers all HLA-DR protein sequences...

  2. Evidence for the concerted evolution between short linear protein motifs and their flanking regions.

    Directory of Open Access Journals (Sweden)

    Claudia Chica

    Full Text Available BACKGROUND: Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein-protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. RESULTS: The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co-evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif-mediated interaction has been shown to depend on the modifications (e.g. phosphorylation at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. CONCLUSION: The results suggest that flanking regions are relevant for linear motif-mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network

  3. Functional Role of Histidine in the Conserved His-x-Asp Motif in the Catalytic Core of Protein Kinases.

    Science.gov (United States)

    Zhang, Lun; Wang, Jian-Chuan; Hou, Li; Cao, Peng-Rong; Wu, Li; Zhang, Qian-Sen; Yang, Huai-Yu; Zang, Yi; Ding, Jian-Ping; Li, Jia

    2015-05-11

    The His-x-Asp (HxD) motif is one of the most conserved structural components of the catalytic core of protein kinases; however, the functional role of the conserved histidine is unclear. Here we report that replacement of the HxD-histidine with Arginine or Phenylalanine in Aurora A abolishes both the catalytic activity and auto-phosphorylation, whereas the Histidine-to-tyrosine impairs the catalytic activity without affecting its auto-phosphorylation. Comparisons of the crystal structures of wild-type (WT) and mutant Aurora A demonstrate that the impairment of the kinase activity is accounted for by (1) disruption of the regulatory spine in the His-to-Arg mutant, and (2) change in the geometry of backbones of the Asp-Phe-Gly (DFG) motif and the DFG-1 residue in the His-to-Tyr mutant. In addition, bioinformatics analyses show that the HxD-histidine is a mutational hotspot in tumor tissues. Moreover, the H174R mutation of the HxD-histidine, in the tumor suppressor LKB1 abrogates the inhibition of anchorage-independent growth of A549 cells by WT LKB1. Based on these data, we propose that the HxD-histidine is involved in a conserved inflexible organization of the catalytic core that is required for the kinase activity. Mutation of the HxD-histidine may also be involved in the pathogenesis of some diseases including cancer.

  4. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  5. Use of Limited Proteolysis and Mutagenesis To Identify Folding Domains and Sequence Motifs Critical for Wax Ester Synthase/Acyl Coenzyme A:Diacylglycerol Acyltransferase Activity

    Science.gov (United States)

    Villa, Juan A.; Cabezas, Matilde; de la Cruz, Fernando

    2014-01-01

    Triacylglycerols and wax esters are synthesized as energy storage molecules by some proteobacteria and actinobacteria under stress. The enzyme responsible for neutral lipid accumulation is the bifunctional wax ester synthase/acyl-coenzyme A (CoA):diacylglycerol acyltransferase (WS/DGAT). Structural modeling of WS/DGAT suggests that it can adopt an acyl-CoA-dependent acyltransferase fold with the N-terminal and C-terminal domains connected by a helical linker, an architecture demonstrated experimentally by limited proteolysis. Moreover, we found that both domains form an active complex when coexpressed as independent polypeptides. The structural prediction and sequence alignment of different WS/DGAT proteins indicated catalytically important motifs in the enzyme. Their role was probed by measuring the activities of a series of alanine scanning mutants. Our study underscores the structural understanding of this protein family and paves the way for their modification to improve the production of neutral lipids. PMID:24296496

  6. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps.

    Science.gov (United States)

    Drory Retwitzer, Matan; Polishchuk, Maya; Churkin, Elena; Kifer, Ilona; Yakhini, Zohar; Barash, Danny

    2015-07-01

    Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular-no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site.

  7. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins.

    Science.gov (United States)

    Foulk, Michael S; Urban, John M; Casella, Cinzia; Gerbi, Susan A

    2015-05-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na(+) instead of K(+) in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq.

  8. Effects of chemokine (C–C motif) ligand 1 on microglial function

    Energy Technology Data Exchange (ETDEWEB)

    Akimoto, Nozomi [Laboratory of Pathophysiology, Graduate School of Pharmaceutical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582 (Japan); Ifuku, Masataka [Laboratory of Integrative Physiology, Graduate School of Medicine, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582 (Japan); Mori, Yuki [Laboratory of Pathophysiology, Graduate School of Pharmaceutical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582 (Japan); Noda, Mami, E-mail: noda@phar.kyushu-u.ac.jp [Laboratory of Pathophysiology, Graduate School of Pharmaceutical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582 (Japan)

    2013-07-05

    Highlights: •CCR8, a specific receptor for CCL-1, was expressed on primary cultured microglia. •Expression of CCR-8 in microglia was upregulated in the presence of CCL-1. •CCL-1 increased motility, proliferation and phagocytosis of cultured microglia. •CCL-1promoted BDNF and IL-6 mRNA, and the release of NO from microglia. •CCL-1 activates microglia and may contribute to the development of neuropathic pain. -- Abstract: Microglia, which constitute the resident macrophages of the central nervous system (CNS), are generally considered as the primary immune cells in the brain and spinal cord. Microglial cells respond to various factors which are produced following nerve injury of multiple aetiologies and contribute to the development of neuronal disease. Chemokine (C–C motif) ligand 1 (CCL-1), a well-characterized chemokine secreted by activated T cells, has been shown to play an important role in neuropathic pain induced by nerve injury and is also produced in various cell types in the CNS, especially in dorsal root ganglia (DRG). However, the role of CCL-1 in the CNS and the effects on microglia remains unclear. Here we showed the multiple effects of CCL-1 on microglia. We first showed that CCR-8, a specific receptor for CCL-1, was expressed on primary cultured microglia, as well as on astrocytes and neurons, and was upregulated in the presence of CCL-1. CCL-1 at concentration of 1 ng/ml induced chemotaxis, increased motility at a higher concentration (100 ng/ml), and increased proliferation and phagocytosis of cultured microglia. CCL-1 also activated microglia morphologically, promoted mRNA levels for brain-derived neurotrophic factor (BDNF) and IL-6, and increased the release of nitrite from microglia. These indicate that CCL-1 has a role as a mediator in neuron-glia interaction, which may contribute to the development of neurological diseases, especially in neuropathic pain.

  9. Functional stabilization of an RNA recognition motif by a noncanonical N-terminal expansion.

    Science.gov (United States)

    Netter, Catharina; Weber, Gert; Benecke, Heike; Wahl, Markus C

    2009-07-01

    RNA recognition motifs (RRMs) constitute versatile macromolecular interaction platforms. They are found in many components of spliceosomes, in which they mediate RNA and protein interactions by diverse molecular strategies. The human U11/U12-65K protein of the minor spliceosome employs a C-terminal RRM to bind hairpin III of the U12 small nuclear RNA (snRNA). This interaction comprises one side of a molecular bridge between the U11 and U12 small nuclear ribonucleoprotein particles (snRNPs) and is reminiscent of the binding of the N-terminal RRMs in the major spliceosomal U1A and U2B'' proteins to hairpins in their cognate snRNAs. Here we show by mutagenesis and electrophoretic mobility shift assays that the beta-sheet surface and a neighboring loop of 65K C-terminal RRM are involved in RNA binding, as previously seen in canonical RRMs like the N-terminal RRMs of the U1A and U2B'' proteins. However, unlike U1A and U2B'', some 30 residues N-terminal of the 65K C-terminal RRM core are additionally required for stable U12 snRNA binding. The crystal structure of the expanded 65K C-terminal RRM revealed that the N-terminal tail adopts an alpha-helical conformation and wraps around the protein toward the face opposite the RNA-binding platform. Point mutations in this part of the protein had only minor effects on RNA affinity. Removal of the N-terminal extension significantly decreased the thermal stability of the 65K C-terminal RRM. These results demonstrate that the 65K C-terminal RRM is augmented by an N-terminal element that confers stability to the domain, and thereby facilitates stable RNA binding.

  10. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage...

  11. The oligodeoxynucleotide sequences corresponding to never-expressed peptide motifs are mainly located in the non-coding strand

    Directory of Open Access Journals (Sweden)

    Bickis Mik

    2010-07-01

    Full Text Available Abstract Background We study the usage of specific peptide platforms in protein composition. Using the pentapeptide as a unit of length, we find that in the universal proteome many pentapeptides are heavily repeated (even thousands of times, whereas some are quite rare, and a small number do not appear at all. To understand the physico-chemical-biological basis underlying peptide usage at the proteomic level, in this study we analyse the energetic costs for the synthesis of rare and never-expressed versus frequent pentapeptides. In addition, we explore residue bulkiness, hydrophobicity, and codon number as factors able to modulate specific peptide frequencies. Then, the possible influence of amino acid composition is investigated in zero- and high-frequency pentapeptide sets by analysing the frequencies of the corresponding inverse-sequence pentapeptides. As a final step, we analyse the pentadecamer oligodeoxynucleotide sequences corresponding to the never-expressed pentapeptides. Results We find that only DNA context-dependent constraints (such as oligodeoxynucleotide sequence location in the minus strand, introns, pseudogenes, frameshifts, etc. provide a coherent mechanistic platform to explain the occurrence of never-expressed versus frequent pentapeptides in the protein world. Conclusions This study is of importance in cell biology. Indeed, the rarity (or lack of expression of specific 5-mer peptide modules implies the rarity (or lack of expression of the corresponding n-mer peptide sequences (with n

  12. SMpred: a support vector machine approach to identify structural motifs in protein structure without using evolutionary information.

    Science.gov (United States)

    Pugalenthi, Ganesan; Kandaswamy, Krishna Kumar; Suganthan, P N; Sowdhamini, R; Martinetz, Thomas; Kolatkar, Prasanna R

    2010-12-01

    Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/SMpred.htm.

  13. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  14. Network motifs provide signatures that characterize metabolism†

    OpenAIRE

    Shellman, Erin R.; Burant, Charles F.; Schnell, Santiago

    2013-01-01

    Motifs are repeating patterns that determine the local properties of networks. In this work, we characterized all 3-node motifs using enzyme commission numbers of the International Union of Biochemistry and Molecular Biology to show that motif abundance is related to biochemical function. Further, we present a comparative analysis of motif distributions in the metabolic networks of 21 species across six kingdoms of life. We found the distribution of motif abundances to be similar between spec...

  15. Discovery of novel interacting partners of PSMD9, a proteasomal chaperone: Role of an Atypical and versatile PDZ-domain motif interaction and identification of putative functional modules

    Science.gov (United States)

    Sangith, Nikhil; Srinivasaraghavan, Kannan; Sahu, Indrajit; Desai, Ankita; Medipally, Spandana; Somavarappu, Arun Kumar; Verma, Chandra; Venkatraman, Prasanna

    2014-01-01

    PSMD9 (Proteasome Macropain non-ATPase subunit 9), a proteasomal assembly chaperone, harbors an uncharacterized PDZ-like domain. Here we report the identification of five novel interacting partners of PSMD9 and provide the first glimpse at the structure of the PDZ-domain, including the molecular details of the interaction. We based our strategy on two propositions: (a) proteins with conserved C-termini may share common functions and (b) PDZ domains interact with C-terminal residues of proteins. Screening of C-terminal peptides followed by interactions using full-length recombinant proteins, we discovered hnRNPA1 (an RNA binding protein), S14 (a ribosomal protein), CSH1 (a growth hormone), E12 (a transcription factor) and IL6 receptor as novel PSMD9-interacting partners. Through multiple techniques and structural insights, we clearly demonstrate for the first time that human PDZ domain interacts with the predicted Short Linear Sequence Motif (SLIM) at the C-termini of the client proteins. These interactions are also recapitulated in mammalian cells. Together, these results are suggestive of the role of PSMD9 in transcriptional regulation, mRNA processing and editing, hormone and receptor activity and protein translation. Our proof-of-principle experiments endorse a novel and quick method for the identification of putative interacting partners of similar PDZ-domain proteins from the proteome and for discovering novel functions. PMID:25009770

  16. Discovery of novel interacting partners of PSMD9, a proteasomal chaperone: Role of an Atypical and versatile PDZ-domain motif interaction and identification of putative functional modules

    Directory of Open Access Journals (Sweden)

    Nikhil Sangith

    2014-01-01

    Full Text Available PSMD9 (Proteasome Macropain non-ATPase subunit 9, a proteasomal assembly chaperone, harbors an uncharacterized PDZ-like domain. Here we report the identification of five novel interacting partners of PSMD9 and provide the first glimpse at the structure of the PDZ-domain, including the molecular details of the interaction. We based our strategy on two propositions: (a proteins with conserved C-termini may share common functions and (b PDZ domains interact with C-terminal residues of proteins. Screening of C-terminal peptides followed by interactions using full-length recombinant proteins, we discovered hnRNPA1 (an RNA binding protein, S14 (a ribosomal protein, CSH1 (a growth hormone, E12 (a transcription factor and IL6 receptor as novel PSMD9-interacting partners. Through multiple techniques and structural insights, we clearly demonstrate for the first time that human PDZ domain interacts with the predicted Short Linear Sequence Motif (SLIM at the C-termini of the client proteins. These interactions are also recapitulated in mammalian cells. Together, these results are suggestive of the role of PSMD9 in transcriptional regulation, mRNA processing and editing, hormone and receptor activity and protein translation. Our proof-of-principle experiments endorse a novel and quick method for the identification of putative interacting partners of similar PDZ-domain proteins from the proteome and for discovering novel functions.

  17. Autocorrelation of Sequences Generated by Single Cycle T-Functions

    Institute of Scientific and Technical Information of China (English)

    Wang Yan; Hu Yupu; Li Shunbo; Yang Yang

    2011-01-01

    Cryptographic properties of the single cycle T-function's output sequences are investigated.Bounds of autocorrelation functions of the kth coordinate sequence and bounds of state output sequence are calculated respectively.The Maximum Sidelobe Ratio (MSR) of the kth coordinate sequence and the MSR of state output sequence are given respectively.The bounds of autocorrelation functions show that the values of autocorrelation functions are large when shifts are small.Comparisons of the autocorrelations between the state output sequence and coordinate output sequence are illustrated.The autocorrelation properties demonstrate that T-functions have cryptographic weaknesses and the illustration result shows coordinate output sequences have better autocorrelation than that of state output sequences.

  18. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    OpenAIRE

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzym...

  19. Association of Arabidopsis type-II ROPs with the plasma membrane requires a conserved C-terminal sequence motif and a proximal polybasic domain.

    Science.gov (United States)

    Lavy, Meirav; Yalovsky, Shaul

    2006-06-01

    Plant ROPs (or RACs) are soluble Ras-related small GTPases that are attached to cell membranes by virtue of the post-translational lipid modifications of prenylation and S-acylation. ROPs (RACs) are subdivided into two major subgroups called type-I and type-II. Whereas type-I ROPs terminate with a conserved CaaL box and undergo prenylation, type-II ROPs undergo S-acylation on two or three C-terminal cysteines. In the present work we determined the sequence requirement for association of Arabidopsis type-II ROPs with the plasma membrane. We identified a conserved sequence motif, designated the GC-CG box, in which the modified cysteines are flanked by glycines. The GC-CG box cysteines are separated by five to six mostly non-polar residues. Deletion of this sequence or the introduction of mutations that change its nature disrupted the association of ROPs with the membrane. Mutations that changed the GC-CG box glycines to alanines also interfered with membrane association. Deletion of a polybasic domain proximal to the GC-CG box disrupted the plasma membrane association of AtROP10. A green fluorescent protein fusion protein containing the C-terminal 25 residues of AtROP10, including its polybasic domain and GC-CG box, was primarily associated with the plasma membrane but a similar fusion protein lacking the polybasic domain was exclusively localized in the soluble fraction. These data provide evidence for the minimal sequence required for plasma membrane association of type-II ROPs in Arabidopsis and other plant species.

  20. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  1. Automatic discovery of cross-family sequence features associated with protein function

    Directory of Open Access Journals (Sweden)

    Krings Andrea

    2006-01-01

    Full Text Available Abstract Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for

  2. Functional phylogenetic analysis of LGI proteins identifies an interaction motif crucial for myelination.

    Science.gov (United States)

    Kegel, Linde; Jaegle, Martine; Driegen, Siska; Aunin, Eerik; Leslie, Kris; Fukata, Yuko; Watanabe, Masahiko; Fukata, Masaki; Meijer, Dies

    2014-04-01

    The cellular interactions that drive the formation and maintenance of the insulating myelin sheath around axons are only partially understood. Leucine-rich glioma-inactivated (LGI) proteins play important roles in nervous system development and mutations in their genes have been associated with epilepsy and amyelination. Their function involves interactions with ADAM22 and ADAM23 cell surface receptors, possibly in apposing membranes, thus attenuating cellular interactions. LGI4-ADAM22 interactions are required for axonal sorting and myelination in the developing peripheral nervous system (PNS). Functional analysis revealed that, despite their high homology and affinity for ADAM22, LGI proteins are functionally distinct. To dissect the key residues in LGI proteins required for coordinating axonal sorting and myelination in the developing PNS, we adopted a phylogenetic and computational approach and demonstrate that the mechanism of action of LGI4 depends on a cluster of three amino acids on the outer surface of the LGI4 protein, thus providing a structural basis for the mechanistic differences in LGI protein function in nervous system development and evolution.

  3. Determination of 5 '-leader sequences from radically disparate strains of porcine reproductive and respiratory syndrome virus reveals the presence of highly conserved sequence motifs

    DEFF Research Database (Denmark)

    Oleksiewicz, M.B.; Bøtner, Anette; Nielsen, Jens;

    1999-01-01

    We determined the untranslated 5'-leader sequence for three different isolates of porcine reproductive and respiratory syndrome virus (PRRSV): pathogenic European- and American-types, as well as an American-type vaccine strain. 5'-leader from European- and American-type PRRSV differed in length...... a priori knowledge for mutational identification of virulence determinants in the 5' nontranslated part of the PRRSV genome....

  4. Molybdenum and tungsten oxygen transferases--and functional diversity within a common active site motif.

    Science.gov (United States)

    Pushie, M Jake; Cotelesage, Julien J; George, Graham N

    2014-01-01

    Molybdenum and tungsten are the only second and third-row transition elements with a known function in living organisms. The molybdenum and tungsten enzymes show common structural features, with the metal being bound by a pyranopterin-dithiolene cofactor called molybdopterin. They catalyze a variety of oxygen transferase reactions coupled with two-electron redox chemistry in which the metal cycles between the +6 and +4 oxidation states usually with water, either product or substrate, providing the oxygen. The functional roles filled by the molybdenum and tungsten enzymes are diverse; for example, they play essential roles in microbial respiration, in the uptake of nitrogen in green plants, and in human health. Together, the enzymes form a superfamily which is among the most prevalent known, being found in all kingdoms of life. This review discusses what is known of the active site structures and the mechanisms, together with some recent insights into the evolution of these important enzyme systems.

  5. Functional significance of a hepta nucleotide motif present at the junction of Cucumber mosaic virus satellite RNA multimers in helper-virus dependent replication.

    Science.gov (United States)

    Seo, Jang-Kyun; Kwon, Sun-Jung; Chaturvedi, Sonali; Choi, Soon Ho; Rao, A L N

    2013-01-20

    Satellite RNAs (satRNA) associated with Cucumber mosaic virus (CMV) have been shown to generate multimers during replication. We have discovered that multimers of a CMV satRNA generated in the absence of its helper virus (HV) are characterized by the addition of a hepta nucleotide motif (HNM) at the monomer junctions. Here, we evaluated the functional significance of HNM in HV-dependent replication by ectopically expressing wild type and mutant forms of satRNA multimers in planta either in (+) or (-)-strand polarity. Comparative replication profiles revealed that (-)-strand multimers with complementary HNM (cHNM) are the preferred initial templates for HV-dependent replication than (-)-strand monomers and multimers lacking the cHNM. Further mutational analyses of the HNM accentuate that preservation of the sequence and native length of HNM is obligatory for efficient replication of satRNA. A model implicating the significance of HNM in HV-dependent production of monomeric and multimeric forms of satRNA is presented.

  6. Evolutionarily conserved bias of amino-acid usage refines the definition of PDZ-binding motif.

    Science.gov (United States)

    Chimura, Takahiko; Launey, Thomas; Ito, Masao

    2011-06-08

    The interactions between PDZ (PSD-95, Dlg, ZO-1) domains and PDZ-binding motifs play central roles in signal transductions within cells. Proteins with PDZ domains bind to PDZ-binding motifs almost exclusively when the motifs are located at the carboxyl (C-) terminal ends of their binding partners. However, it remains little explored whether PDZ-binding motifs show any preferential location at the C-terminal ends of proteins, at genome-level. Here, we examined the distribution of the type-I (x-x-S/T-x-I/L/V) or type-II (x-x-V-x-I/V) PDZ-binding motifs in proteins encoded in the genomes of five different species (human, mouse, zebrafish, fruit fly and nematode). We first established that these PDZ-binding motifs are indeed preferentially present at their C-terminal ends. Moreover, we found specific amino acid (AA) bias for the 'x' positions in the motifs at the C-terminal ends. In general, hydrophilic AAs were favored. Our genomics-based findings confirm and largely extend the results of previous interaction-based studies, allowing us to propose refined consensus sequences for all of the examined PDZ-binding motifs. An ontological analysis revealed that the refined motifs are functionally relevant since a large fraction of the proteins bearing the motif appear to be involved in signal transduction. Furthermore, co-precipitation experiments confirmed two new protein interactions predicted by our genomics-based approach. Finally, we show that influenza virus pathogenicity can be correlated with PDZ-binding motif, with high-virulence viral proteins bearing a refined PDZ-binding motif. Our refined definition of PDZ-binding motifs should provide important clues for identifying functional PDZ-binding motifs and proteins involved in signal transduction.

  7. Evolutionarily conserved bias of amino-acid usage refines the definition of PDZ-binding motif

    Directory of Open Access Journals (Sweden)

    Launey Thomas

    2011-06-01

    Full Text Available Abstract Background The interactions between PDZ (PSD-95, Dlg, ZO-1 domains and PDZ-binding motifs play central roles in signal transductions within cells. Proteins with PDZ domains bind to PDZ-binding motifs almost exclusively when the motifs are located at the carboxyl (C- terminal ends of their binding partners. However, it remains little explored whether PDZ-binding motifs show any preferential location at the C-terminal ends of proteins, at genome-level. Results Here, we examined the distribution of the type-I (x-x-S/T-x-I/L/V or type-II (x-x-V-x-I/V PDZ-binding motifs in proteins encoded in the genomes of five different species (human, mouse, zebrafish, fruit fly and nematode. We first established that these PDZ-binding motifs are indeed preferentially present at their C-terminal ends. Moreover, we found specific amino acid (AA bias for the 'x' positions in the motifs at the C-terminal ends. In general, hydrophilic AAs were favored. Our genomics-based findings confirm and largely extend the results of previous interaction-based studies, allowing us to propose refined consensus sequences for all of the examined PDZ-binding motifs. An ontological analysis revealed that the refined motifs are functionally relevant since a large fraction of the proteins bearing the motif appear to be involved in signal transduction. Furthermore, co-precipitation experiments confirmed two new protein interactions predicted by our genomics-based approach. Finally, we show that influenza virus pathogenicity can be correlated with PDZ-binding motif, with high-virulence viral proteins bearing a refined PDZ-binding motif. Conclusions Our refined definition of PDZ-binding motifs should provide important clues for identifying functional PDZ-binding motifs and proteins involved in signal transduction.

  8. The brain's code and its canonical computational motifs. From sensory cortex to the default mode network: A multi-scale model of brain function in health and disease.

    Science.gov (United States)

    Turkheimer, Federico E; Leech, Robert; Expert, Paul; Lord, Louis-David; Vernon, Anthony C

    2015-08-01

    A variety of anatomical and physiological evidence suggests that the brain performs computations using motifs that are repeated across species, brain areas, and modalities. The computational architecture of cortex, for example, is very similar from one area to another and the types, arrangements, and connections of cortical neurons are highly stereotyped. This supports the idea that each cortical area conducts calculations using similarly structured neuronal modules: what we term canonical computational motifs. In addition, the remarkable self-similarity of the brain observables at the micro-, meso- and macro-scale further suggests that these motifs are repeated at increasing spatial and temporal scales supporting brain activity from primary motor and sensory processing to higher-level behaviour and cognition. Here, we briefly review the biological bases of canonical brain circuits and the role of inhibitory interneurons in these computational elements. We then elucidate how canonical computational motifs can be repeated across spatial and temporal scales to build a multiplexing information system able to encode and transmit information of increasing complexity. We point to the similarities between the patterns of activation observed in primary sensory cortices by use of electrophysiology and those observed in large scale networks measured with fMRI. We then employ the canonical model of brain function to unify seemingly disparate evidence on the pathophysiology of schizophrenia in a single explanatory framework. We hypothesise that such a framework may also be extended to cover multiple brain disorders which are grounded in dysfunction of GABA interneurons and/or these computational motifs.

  9. Generating Functions for the Powers of Fibonacci Sequences

    Science.gov (United States)

    Terrana, D.; Chen, H.

    2007-01-01

    In this note, based on the Binet formulas and the power-reducing techniques, closed forms of generating functions for the powers of Fibonacci sequences are presented. The corresponding results are extended to some other famous sequences as well.

  10. Measurement of creatinine in human plasma using a functional porous polymer structure sensing motif.

    Science.gov (United States)

    Nanda, Sitansu Sekhar; An, Seong Soo A; Yi, Dong Kee

    2015-01-01

    In this study, a new method for detecting creatinine was developed. This novel sensor comprised of two ionic liquids, poly-lactic-co-glycolic acid (PLGA) and 1-butyl-3-methylimidazolium (BMIM) chloride, in the presence of 2',7'-dichlorofluorescein diacetate (DCFH-DA). PLGA and BMIM chloride formed a functional porous polymer structure (FPPS)-like structure. Creatinine within the FPPS rapidly hydrolyzed and released OH(-), which in turn converted DCFH-DA to DCFH, developing an intense green color or green fluorescence. The conversion of DCFH to DCF(+) resulted in swelling of FPPS and increased solubility. This DCF(+)-based sensor could detect creatinine levels with detection limit of 5 µM and also measure the creatinine in blood. This novel method could be used in diagnostic applications for monitoring individuals with renal dysfunction.

  11. MINER: software for phylogenetic motif identification.

    Science.gov (United States)

    La, David; Livesay, Dennis R

    2005-07-01

    MINER is web-based software for phylogenetic motif (PM) identification. PMs are sequence regions (fragments) that conserve the overall familial phylogeny. PMs have been shown to correspond to a wide variety of catalytic regions, substrate-binding sites and protein interfaces, making them ideal functional site predictions. The MINER output provides an intuitive interface for interactive PM sequence analysis and structural visualization. The web implementation of MINER is freely available at http://www.pmap.csupomona.edu/MINER/. Source code is available to the academic community on request.

  12. Laser spectroscopic and theoretical studies of the structures and encapsulation motifs of functional molecules

    Energy Technology Data Exchange (ETDEWEB)

    Ebata, Takayuki; Kusaka, Ryoji [Department of Chemistry, Graduate School of Science, Hiroshima University, Kagamiyama 1-3-1, Higashi-Hiroshima, 739-8526 (Japan); Xantheas, Sotiris S. [Chemical and Materials Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, MS K1-83, Richland, WA 99352 (United States)

    2015-01-22

    Extensive laser spectroscopic and theoretical studies have been recently carried out with the aim to reveal the structure and dynamics of encapsulation complexes in the gas phase. The characteristics of the encapsulation complexes are governed by the fact that (i) most of the host molecules are flexible and (ii) the complexes form high dimensional structures by using weak non-covalent interactions. These characteristics result in the possibility of the coexistence of many conformers in close energetic proximity. The combination of supersonic jet/laser spectroscopy and high level quantum chemical calculations is essential in tackling these challenging problems. In this report we describe our recent studies on the structures and dynamics of the encapsulation complexes formed by calix[4]arene (C4A), dibenzo-18-crown-6-ether (DB18C6), and benzo-18-crown-6-ether (B18C6) 'hosts' interacting with N{sub 2}, acetylene, water, and ammonia 'guest' molecules. The gaseous host-guest complexes are generated under jet-cooled conditions. We apply various laser spectroscopic methods to obtain the conformer- and isomer-specified electronic and IR spectra. The experimental results are complemented with quantum chemical calculations ranging from density functional theory to high level first principles calculations at the MP2 and CCSD(T) levels of theory. We discuss the possible conformations of the bare host molecules, the structural changes they undergo upon complexation, and the key interactions that are responsible in stabilizing the specific complexes.

  13. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    Science.gov (United States)

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  14. Functional insight from the tetratricopeptide repeat-like motifs of the type III secretion chaperone SicA in Salmonella enterica serovar Typhimurium.

    Science.gov (United States)

    Kim, Jin Seok; Kim, Bae-Hoon; Jang, Jung Im; Eom, Jeong Seon; Kim, Hyeon Guk; Bang, Iel Soo; Park, Yong Keun

    2014-01-01

    SicA functions both as a class II chaperone for SipB and SipC of the type III secretion system (T3SS)-1 and as a transcriptional cofactor for the AraC-type transcription factor InvF in Salmonella enterica subsp. enterica serovar Typhimurium. Bioinformatic analysis has predicted that SicA possesses three tetratricopeptide repeat (TPR)-like motifs, which are important for protein-protein interactions and serve as multiprotein complex mediators. To investigate whether the TPR-like motifs in SicA are critical for its transcriptional cofactor function, the canonical residues in these motifs were mutated to glutamate (SicAA44E , SicAA78E , and SicAG112E ). None of these mutants except SicAA44E were able to activate the expression of the sipB and sigD genes. SicAA44E still has a capacity to interact with InvF in vitro, and despite its instability in cell, it could activate the sigDE operon. This suggests that TPR motifs are important for the transcriptional cofactor function of the SicA chaperone. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  15. Anterograde trafficking of G protein-coupled receptors: function of the C-terminal F(X)6LL motif in export from the endoplasmic reticulum.

    Science.gov (United States)

    Duvernay, Matthew T; Dong, Chunmin; Zhang, Xiaoping; Zhou, Fuguo; Nichols, Charles D; Wu, Guangyu

    2009-04-01

    We have reported previously that the F(X)(6)LL motif in the C termini is essential for export of alpha(2B)-adrenergic (alpha(2B)-AR) and angiotensin II type 1 receptors (AT1Rs) from the endoplasmic reticulum (ER). Here, we further demonstrate that mutation of the F(X)(6)LL motif similarly abolished the cell-surface expression of alpha(2B)-AR, AT1R, alpha(1B)-AR, and beta(2)-AR, suggesting that the F(X)(6)LL motif plays a general role in ER export of G protein-coupled receptors (GPCRs). Mutation of Phe to Val, Leu, Trp, and Tyr, and mutation of LL to FF and VV, markedly inhibited alpha(2B)-AR transport, indicating that the F(X)(6)LL function cannot be fully substituted by other hydrophobic residues. The structural analysis revealed that the Phe residue in the F(X)(6)LL motif is buried in the transmembrane domains and possibly interacts with Ile58 in beta(2)-AR and Val42 in alpha(2B)-AR, whereas the LL motif is exposed to the cytosolic space. Indeed, mutation of Ile58 in beta(2)-AR and Val42 in alpha(2B)-AR markedly disrupted cell surface transport of the receptors. It is noteworthy that the Val and Ile residues are highly conserved among the GPCRs carrying the F(X)(6)LL motif. Furthermore, the Phe mutant exhibited a stronger interaction with ER chaperones and was more potently rescued by physical and chemical treatments than the LL mutant. These data suggest that the Phe residue is probably involved in folding of alpha(2B)-AR and beta(2)-AR, possibly through interaction with other hydrophobic residues in neighboring domains. These data also provide the first evidence implying crucial roles of the C termini possibly through modulating multiple events in anterograde trafficking of GPCRs.

  16. MULTI-VALUED CROSSCORRELATION FUNCTION FOR m-SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    LiChao; ZhouMin

    2004-01-01

    Based on the theory of exponential sums and quadratic forms over finite field, the crosscorrelation function values between two maximal linear recursive sequences are determined under some conditions.

  17. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  18. On some difference sequence spaces defined by a sequence of Orlicz functions

    Institute of Scientific and Technical Information of China (English)

    ASMA BEKTA(S) (C)i(g)dem

    2006-01-01

    The idea of difference sequence spaces was introduced in (Kizmaz, 1981) and this concept was generalized in (Et and Colak, 1995). In this paper we define some difference sequence spaces by a sequence of Orlicz functions and establish some inclusion relations.

  19. Uncharacterized conserved motifs outside the HD-Zip domain in HD-Zip subfamily I transcription factors; a potential source of functional diversity

    Directory of Open Access Journals (Sweden)

    Cabello Julieta V

    2011-03-01

    Full Text Available Abstract Background Plant HD-Zip transcription factors are modular proteins in which a homeodomain is associated to a leucine zipper. Of the four subfamilies in which they are divided, the tested members from subfamily I bind in vitro the same pseudopalindromic sequence CAAT(A/TATTG and among them, several exhibit similar expression patterns. However, most experiments in which HD-Zip I proteins were over or ectopically expressed under the control of the constitutive promoter 35S CaMV resulted in transgenic plants with clearly different phenotypes. Aiming to elucidate the structural mechanisms underlying such observation and taking advantage of the increasing information in databases of sequences from diverse plant species, an in silico analysis was performed. In addition, some of the results were also experimentally supported. Results A phylogenetic tree of 178 HD-Zip I proteins together with the sequence conservation presented outside the HD-Zip domains allowed the distinction of six groups of proteins. A motif-discovery approach enabled the recognition of an activation domain in the carboxy-terminal regions (CTRs and some putative regulatory mechanisms acting in the amino-terminal regions (NTRs and CTRs involving sumoylation and phosphorylation. A yeast one-hybrid experiment demonstrated that the activation activity of ATHB1, a member of one of the groups, is located in its CTR. Chimerical constructs were performed combining the HD-Zip domain of one member with the CTR of another and transgenic plants were obtained with these constructs. The phenotype of the chimerical transgenic plants was similar to the observed in transgenic plants bearing the CTR of the donor protein, revealing the importance of this module inside the whole protein. Conclusions The bioinformatical results and the experiments conducted in yeast and transgenic plants strongly suggest that the previously poorly analyzed NTRs and CTRs of HD-Zip I proteins play an important

  20. A Polybasic Plasma Membrane Binding Motif in the I-II Linker Stabilizes Voltage-gated CaV1.2 Calcium Channel Function.

    Science.gov (United States)

    Kaur, Gurjot; Pinggera, Alexandra; Ortner, Nadine J; Lieb, Andreas; Sinnegger-Brauns, Martina J; Yarov-Yarovoy, Vladimir; Obermair, Gerald J; Flucher, Bernhard E; Striessnig, Jörg

    2015-08-21

    L-type voltage-gated Ca(2+) channels (LTCCs) regulate many physiological functions like muscle contraction, hormone secretion, gene expression, and neuronal excitability. Their activity is strictly controlled by various molecular mechanisms. The pore-forming α1-subunit comprises four repeated domains (I-IV), each connected via an intracellular linker. Here we identified a polybasic plasma membrane binding motif, consisting of four arginines, within the I-II linker of all LTCCs. The primary structure of this motif is similar to polybasic clusters known to interact with polyphosphoinositides identified in other ion channels. We used de novo molecular modeling to predict the conformation of this polybasic motif, immunofluorescence microscopy and live cell imaging to investigate the interaction with the plasma membrane, and electrophysiology to study its role for Cav1.2 channel function. According to our models, this polybasic motif of the I-II linker forms a straight α-helix, with the positive charges facing the lipid phosphates of the inner leaflet of the plasma membrane. Membrane binding of the I-II linker could be reversed after phospholipase C activation, causing polyphosphoinositide breakdown, and was accelerated by elevated intracellular Ca(2+) levels. This indicates the involvement of negatively charged phospholipids in the plasma membrane targeting of the linker. Neutralization of four arginine residues eliminated plasma membrane binding. Patch clamp recordings revealed facilitated opening of Cav1.2 channels containing these mutations, weaker inhibition by phospholipase C activation, and reduced expression of channels (as quantified by ON-gating charge) at the plasma membrane. Our data provide new evidence for a membrane binding motif within the I-II linker of LTCC α1-subunits essential for stabilizing normal Ca(2+) channel function. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  1. The UlaG protein family defines novel structural and functional motifs grafted on an ancient RNase fold

    Directory of Open Access Journals (Sweden)

    Coll Miquel

    2011-09-01

    Full Text Available Abstract Background Bacterial populations are highly successful at colonizing new habitats and adapting to changing environmental conditions, partly due to their capacity to evolve novel virulence and metabolic pathways in response to stress conditions and to shuffle them by horizontal gene transfer (HGT. A common theme in the evolution of new functions consists of gene duplication followed by functional divergence. UlaG, a unique manganese-dependent metallo-β-lactamase (MBL enzyme involved in L-ascorbate metabolism by commensal and symbiotic enterobacteria, provides a model for the study of the emergence of new catalytic activities from the modification of an ancient fold. Furthermore, UlaG is the founding member of the so-called UlaG-like (UlaGL protein family, a recently established and poorly characterized family comprising divalent (and perhaps trivalent metal-binding MBLs that catalyze transformations on phosphorylated sugars and nucleotides. Results Here we combined protein structure-guided and sequence-only molecular phylogenetic analyses to dissect the molecular evolution of UlaG and to study its phylogenomic distribution, its relatedness with present-day UlaGL protein sequences and functional conservation. Phylogenetic analyses indicate that UlaGL sequences are present in Bacteria and Archaea, with bona fide orthologs found mainly in mammalian and plant-associated Gram-negative and Gram-positive bacteria. The incongruence between the UlaGL tree and known species trees indicates exchange by HGT and suggests that the UlaGL-encoding genes provided a growth advantage under changing conditions. Our search for more distantly related protein sequences aided by structural homology has uncovered that UlaGL sequences have a common evolutionary origin with present-day RNA processing and metabolizing MBL enzymes widespread in Bacteria, Archaea, and Eukarya. This observation suggests an ancient origin for the UlaGL family within the broader trunk

  2. Structural and functional analysis of VQ motif-containing proteins in Arabidopsis as interacting proteins of WRKY transcription factors.

    Science.gov (United States)

    Cheng, Yuan; Zhou, Yuan; Yang, Yan; Chi, Ying-Jun; Zhou, Jie; Chen, Jian-Ye; Wang, Fei; Fan, Baofang; Shi, Kai; Zhou, Yan-Hong; Yu, Jing-Quan; Chen, Zhixiang

    2012-06-01

    WRKY transcription factors are encoded by a large gene superfamily with a broad range of roles in plants. Recently, several groups have reported that proteins containing a short VQ (FxxxVQxLTG) motif interact with WRKY proteins. We have recently discovered that two VQ proteins from Arabidopsis (Arabidopsis thaliana), SIGMA FACTOR-INTERACTING PROTEIN1 and SIGMA FACTOR-INTERACTING PROTEIN2, act as coactivators of WRKY33 in plant defense by specifically recognizing the C-terminal WRKY domain and stimulating the DNA-binding activity of WRKY33. In this study, we have analyzed the entire family of 34 structurally divergent VQ proteins from Arabidopsis. Yeast (Saccharomyces cerevisiae) two-hybrid assays showed that Arabidopsis VQ proteins interacted specifically with the C-terminal WRKY domains of group I and the sole WRKY domains of group IIc WRKY proteins. Using site-directed mutagenesis, we identified structural features of these two closely related groups of WRKY domains that are critical for interaction with VQ proteins. Quantitative reverse transcription polymerase chain reaction revealed that expression of a majority of Arabidopsis VQ genes was responsive to pathogen infection and salicylic acid treatment. Functional analysis using both knockout mutants and overexpression lines revealed strong phenotypes in growth, development, and susceptibility to pathogen infection. Altered phenotypes were substantially enhanced through cooverexpression of genes encoding interacting VQ and WRKY proteins. These findings indicate that VQ proteins play an important role in plant growth, development, and response to environmental conditions, most likely by acting as cofactors of group I and IIc WRKY transcription factors.

  3. Functional interaction of phospholipid hydroperoxide glutathione peroxidase with sperm mitochondrion-associated cysteine-rich protein discloses the adjacent cysteine motif as a new substrate of the selenoperoxidase.

    Science.gov (United States)

    Maiorino, Matilde; Roveri, Antonella; Benazzi, Louise; Bosello, Valentina; Mauri, Pierluigi; Toppo, Stefano; Tosatto, Silvio C E; Ursini, Fulvio

    2005-11-18

    The mitochondrial capsule is a selenium- and disulfide-rich structure enchasing the outer mitochondrial membrane of mammalian spermatozoa. Among the proteins solubilized from the sperm mitochondrial capsule, we confirmed, by using a proteomic approach, the presence of phospholipid hydroperoxide glutathione peroxidase (PHGPx) as a major component, and we also identified the sperm mitochondrion-associated cysteine-rich protein (SMCP) and fragments/aggregates of specific keratins that previously escaped detection (Ursini, F., Heim, S., Kiess, M., Maiorino, M., Roveri, A., Wissing, J., and Flohé, L. (1999) Science 285, 1393-1396). The evidence for a functional association between PHGPx, SMCP, and keratins is further supported by the identification of a sequence motif of regularly spaced Cys-Cys doublets common to SMCP and high sulfur keratin-associated proteins, involved in bundling hair shaft keratin by disulfide cross-linking. Following the oxidative polymerization of mitochondrial capsule proteins, catalyzed by PHGPx, two-dimensional redox electrophoresis analysis showed homo- and heteropolymers of SMCP and PHGPx, together with other minor components. Adjacent cysteine residues in SMCP peptides are oxidized to cystine by PHGPx. This unusual disulfide is known to drive, by reshuffling oxidative protein folding. On this basis we propose that oxidative polymerization of the mitochondrial capsule is primed by the formation of cystine on SMCP, followed by reshuffling. Occurrence of reshuffling is further supported by the calculated thermodynamic gain of the process. This study suggests a new mechanism where selenium catalysis drives the cross-linking of structural elements of the cytoskeleton via the oxidation of a keratin-associated protein.

  4. Use of sequence motifs as barcodes and secondary structures of Internal Transcribed spacer 2 (ITS2, rDNA) for identification of the Indian liver fluke, Fasciola (Trematoda: Fasciolidae)

    Science.gov (United States)

    Prasad, PK; Tandon, V; Biswal, DK; Goswami, LM; Chatterjee, A

    2009-01-01

    Most phylogenetic studies using current methods have focused on primary DNA sequence information. However, RNA secondary structures are particularly useful in systematics because they include characteristics that give “morphological” information which is not found in the primary sequence. Also DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat are useful for identification of trematodes. The species of liver flukes of the genus Fasciola (Platyhelminthes: Digenea: Fasciolidae) are obligate parasitic trematodes residing in the large biliary ducts of herbivorous mammals. While Fasciola hepatica has a cosmopolitan distribution, the other major species, i.e., F. gigantica is reportedly prevalent in the tropical and subtropical regions of Africa and Asia. To determine the Fasciola sp. of Assam (India) origin based on rDNA molecular data, ribosomal ITS2 region was sequenced (EF027103) and analysed. NCBI databases were used for sequence homology analysis and the phylogenetic trees were constructed based upon the ITS2 using MEGA and a Bayesian analysis of the combined data. The latter approach allowed us to include both primary sequence and RNA molecular morphometrics and revealed a close relationship with isolates of F. gigantica from China, Indonesia and Japan, the isolate from China with significant bootstrap values being the closest. ITS2 sequence motifs allowed an accurate in silico distinction of liver flukes. The data indicate that ITS2 motifs (≤ 50 bp in size) can be considered promising tool for trematode species identification. Using the novel approach of molecular morphometrics that is based on ITS2 secondary structure homologies, phylogenetic relationships of the various isolates of fasciolid species have been discussed. PMID:19294000

  5. A Lysin motif (LysM)-containing protein functions in antibacterial responses of red swamp crayfish, Procambarus clarkii.

    Science.gov (United States)

    Shi, Xiu-Zhen; Zhou, Jing; Lan, Jiang-Feng; Jia, Yu-Ping; Zhao, Xiao-Fan; Wang, Jin-Xing

    2013-01-01

    Lysin domain (LysM) is a widely spread domain in nature and could bind different peptidoglycans and chitin-like compounds in bacteria and eukaryotes. In plants, Lysin motif containing proteins are one of the major classes of pattern recognition proteins which can recognize GlcNAc-containing glycans and have important functions in plant immunity. However, their functions in animal immunity are still unclear. In this study, a cDNA encoding a LysM containing protein was identified from red swamp crayfish, Procambarus clarkii. The cDNA of PcLysM contained 1200 base pair nucleotides with an open reading frame of 702bp encoding a protein of 233 amino acid residues. The deduced protein had a calculated molecular mass of 25.950kDa and a pI of 6.84. Tissue distribution analysis in mRNA level showed that it was highly expressed in gills, hemocytes, and intestine, and lowly expressed in hearts, hepatopancreas, and stomach. Time course expression pattern analysis showed that PcLysM was upregulated in hemocytes and gills after challenge with Vibrio anguillarum, and it was upregulated at 12h after challenge with Staphylococcus aureus in gills. The recombinant PcLysM could bind to different bacteria, and yeast. Further study revealed that PcLysM could bind to peptidoglycans from different bacteria, and chitin. After PcLysM was knocked down, the upregulation of antimicrobial peptide (AMP) genes (crustins and antilipopolysaccharide factors) was suppressed in response to bacterial infection in gills. These results suggest that PcLysM recognizes different microorganisms through binding to polysaccharides, such as peptidoglycans and chitin and regulates the expression of some antimicrobial peptide genes though unknown pathways and regulates the expression of some antimicrobial peptide genes though unknown pathways. This study might provide a clue to elucidate the roles of PcLysM in the innate immune reaction of crayfish P. clarkii. Copyright © 2013 Elsevier Ltd. All rights reserved.

  6. Conserved structural motifs at the C-terminus of baculovirus protein IE0 are important for its functions in transactivation and supporting hr5-mediated DNA replication.

    Science.gov (United States)

    Luria, Neta; Lu, Liqun; Chejanovsky, Nor

    2012-05-01

    IE0 and IE1 are transactivator proteins of the most studied baculovirus, the Autographa californica multiple nucleopolyhedrovirus (AcMNPV). IE0 is a 72.6 kDa protein identical to IE1 with the exception of its 54 N-terminal amino acid residues. To gain some insight about important structural motifs of IE0, we expressed the protein and C‑terminal mutants of it under the control of the Drosophila heat shock promoter and studied the transactivation and replication functions of the transiently expressed proteins. IE0 was able to promote replication of a plasmid bearing the hr5 origin of replication of AcMNPV in transient transfections with a battery of eight plasmids expressing the AcMNPV genes dnapol, helicase, lef-1, lef-2, lef-3, p35, ie-2 and lef-7. IE0 transactivated expression of the baculovirus 39K promoter. Both functions of replication and transactivation were lost after introduction of selected mutations at the basic domain II and helix-loop-helix conserved structural motifs in the C-terminus of the protein. These IE0 mutants were unable to translocate to the cell nucleus. Our results point out the important role of some structural conserved motifs to the proper functioning of IE0.

  7. Conserved Structural Motifs at the C-Terminus of Baculovirus Protein IE0 are Important for its Functions in Transactivation and Supporting hr5-mediated DNA Replication

    Directory of Open Access Journals (Sweden)

    Neta Luria

    2012-05-01

    Full Text Available IE0 and IE1 are transactivator proteins of the most studied baculovirus, the Autographa californica multiple nucleopolyhedrovirus (AcMNPV. IE0 is a 72.6 kDa protein identical to IE1 with the exception of its 54 N-terminal amino acid residues. To gain some insight about important structural motifs of IE0, we expressed the protein and C‑terminal mutants of it under the control of the Drosophila heat shock promoter and studied the transactivation and replication functions of the transiently expressed proteins. IE0 was able to promote replication of a plasmid bearing the hr5 origin of replication of AcMNPV in transient transfections with a battery of eight plasmids expressing the AcMNPV genes dnapol, helicase, lef-1, lef-2, lef-3, p35, ie-2 and lef-7. IE0 transactivated expression of the baculovirus 39K promoter. Both functions of replication and transactivation were lost after introduction of selected mutations at the basic domain II and helix-loop-helix conserved structural motifs in the C-terminus of the protein. These IE0 mutants were unable to translocate to the cell nucleus. Our results point out the important role of some structural conserved motifs to the proper functioning of IE0.

  8. Discovering structural motifs using a structural alphabet: Application to magnesium-binding sites

    Directory of Open Access Journals (Sweden)

    Lim Carmay

    2007-03-01

    Full Text Available Abstract Background For many metalloproteins, sequence motifs characteristic of metal-binding sites have not been found or are so short that they would not be expected to be metal-specific. Striking examples of such metalloproteins are those containing Mg2+, one of the most versatile metal cofactors in cellular biochemistry. Even when Mg2+-proteins share insufficient sequence homology to identify Mg2+-specific sequence motifs, they may still share similarity in the Mg2+-binding site structure. However, no structural motifs characteristic of Mg2+-binding sites have been reported. Thus, our aims are (i to develop a general method for discovering structural patterns/motifs characteristic of ligand-binding sites, given the 3D protein structures, and (ii to apply it to Mg2+-proteins sharing 2+-structural motifs are identified as recurring structural patterns. Results The structural alphabet-based motif discovery method has revealed the structural preference of Mg2+-binding sites for certain local/secondary structures: compared to all residues in the Mg2+-proteins, both first and second-shell Mg2+-ligands prefer loops to helices. Even when the Mg2+-proteins share no significant sequence homology, some of them share a similar Mg2+-binding site structure: 4 Mg2+-structural motifs, comprising 21% of the binding sites, were found. In particular, one of the Mg2+-structural motifs found maps to a specific functional group, namely, hydrolases. Furthermore, 2 of the motifs were not found in non metalloproteins or in Ca2+-binding proteins. The structural motifs discovered thus capture some essential biochemical and/or evolutionary properties, and hence may be useful for discovering proteins where Mg2+ plays an important biological role. Conclusion The structural motif discovery method presented herein is general and can be applied to any set of proteins with known 3D structures. This new method is timely considering the increasing number of structures for

  9. Mutation of the aspartic acid residues of the GDD sequence motif of poliovirus RNA-dependent RNA polymerase results in enzymes with altered metal ion requirements for activity.

    Science.gov (United States)

    Jablonski, S A; Morrow, C D

    1995-01-01

    The poliovirus RNA-dependent RNA polymerase, 3Dpol, is known to share a region of sequence homology with all RNA polymerases centered at the GDD amino acid motif. The two aspartic acids have been postulated to be involved in the catalytic activity and metal ion coordination of the enzyme. To test this hypothesis, we have utilized oligonucleotide site-directed mutagenesis to generate defined mutations in the aspartic acids of the GDD motif of the 3Dpol gene. The codon for the first aspartate (3D-D-328 [D refers to the single amino acid change, and the number refers to its position in the polymerase]) was changed to that for glutamic acid, histidine, asparagine, or glutamine; the codons for both aspartic acids were simultaneously changed to those for glutamic acids; and the codon for the second aspartic acid (3D-D-329) was changed to that for glutamic acid or asparagine. The mutant enzymes were expressed in Escherichia coli, and the in vitro poly(U) polymerase activity was characterized. All of the mutant 3Dpol enzymes were enzymatically inactive in vitro when tested over a range of Mg2+ concentrations. However, when Mn2+ was substituted for Mg2+ in the in vitro assays, the mutant that substituted the second aspartic acid for asparagine (3D-N-329) was active. To further substantiate this finding, a series of different transition metal ions were substituted for Mg2+ in the poly(U) polymerase assay. The wild-type enzyme was active with all metals except Ca2+, while the 3D-N-329 mutant was active only when FeC6H7O5 was used in the reaction. To determine the effects of the mutations on poliovirus replication, the mutant 3Dpol genes were subcloned into an infectious cDNA of poliovirus. The cDNAs containing the mutant 3Dpol genes did not produce infectious virus when transfected into tissue culture cells under standard conditions. Because of the activity of the 3D-N-329 mutant in the presence of Fe2+ and Mn2+, transfections were also performed in the presence of the

  10. Determining and comparing protein function in Bacterial genome sequences

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla

    annotation of genes – the descriptions assigned to genes that describe the likely function of the encoded proteins. This process is limited by several factors, including the definition of a function which can be more or less specific as well as how many genes can actually be assigned a function based...... of this class have very little homology to other known genomes making functional annotation based on sequence similarity very difficult. Inspired in part by this analysis, an approach for comparative functional annotation was created based public sequenced genomes, CMGfunc. Functionally related groups...

  11. Gene cloning and function analysis of ABP9 protein which specifically binds to ABRE2 motif of maize Cat1 gene

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    A cDNA library was constructed using mRNA extracted from 17 days post-pollination (dpp) maize embryos and was screened by employing a yeast one-hybrid system for proteins specifically interacting with ABRE2 motif of maize Cat1 gene. Three truncated overlapping positive clones designated ABP9 were obtained and the full-length cDNA was isolated by 5′ RACE. Searching the database revealed that ABP9 protein belongs to a bZIP-type transcription factor family. ABP9 protein specifically binds to ABRE2 motif and activates the expression of downstream reporter gene in yeast cells. Our results strongly suggest that the ABP9 protein functions as a transcription activator.

  12. Filling the gap between sequence and function: a bioinformatics approach

    NARCIS (Netherlands)

    Bargsten, J.W.

    2014-01-01

    The research presented in this thesis focuses on deriving function from sequence information, with the emphasis on plant sequence data. Unravelling the impact of genomic elements, in most cases genes, on the phenotype of an organism is a major challenge in biological research and modern plant breedi

  13. Sequences, Bent Functions and Jacobsthal sums

    CERN Document Server

    Helleseth, Tor

    2010-01-01

    The $p$-ary function $f(x)$ mapping $\\mathrm{GF}(p^{4k})$ to $\\mathrm{GF}(p)$ and given by $f(x)={\\rm Tr}_{4k}\\big(ax^d+bx^2\\big)$ with $a,b\\in\\mathrm{GF}(p^{4k})$ and $d=p^{3k}+p^{2k}-p^k+1$ is studied with the respect to its exponential sum. In the case when either $a^{p^k(p^k+1)}\

  14. Reference: TCA1MOTIF [PLACE

    Lifescience Database Archive (English)

    Full Text Available TCA1MOTIF Goldsbrough AP, Albrecht H, Stratford R Salicylic acid-inducible binding ...of a tobacco nuclear protein to a 10 bp sequence which is highly conserved amongst stress-inducible genes. Plant J 3:563-571 (1993) PubMed: 8220463; ...

  15. Function of the PEX19-binding site of human adrenoleukodystrophy protein as targeting motif in man and yeast. PMP targeting is evolutionarily conserved.

    Science.gov (United States)

    Halbach, André; Lorenzen, Stephan; Landgraf, Christiane; Volkmer-Engert, Rudolf; Erdmann, Ralf; Rottensteiner, Hanspeter

    2005-06-01

    We predicted in human peroxisomal membrane proteins (PMPs) the binding sites for PEX19, a key player in the topogenesis of PMPs, by virtue of an algorithm developed for yeast PMPs. The best scoring PEX19-binding site was found in the adrenoleukodystrophy protein (ALDP). The identified site was indeed bound by human PEX19 and was also recognized by the orthologous yeast PEX19 protein. Likewise, both human and yeast PEX19 bound with comparable affinities to the PEX19-binding site of the yeast PMP Pex13p. Interestingly, the identified PEX19-binding site of ALDP coincided with its previously determined targeting motif. We corroborated the requirement of the ALDP PEX19-binding site for peroxisomal targeting in human fibroblasts and showed that the minimal ALDP fragment targets correctly also in yeast, again in a PEX19-binding site-dependent manner. Furthermore, the human PEX19-binding site of ALDP proved interchangeable with that of yeast Pex13p in an in vivo targeting assay. Finally, we showed in vitro that most of the predicted binding sequences of human PMPs represent true binding sites for human PEX19, indicating that human PMPs harbor common PEX19-binding sites that do resemble those of yeast. Our data clearly revealed a role for PEX19-binding sites as PMP-targeting motifs across species, thereby demonstrating the evolutionary conservation of PMP signal sequences from yeast to man.

  16. Wiggle-predicting functionally flexible regions from primary sequence.

    Directory of Open Access Journals (Sweden)

    Jenny Gu

    2006-07-01

    Full Text Available The Wiggle series are support vector machine-based predictors that identify regions of functional flexibility using only protein sequence information. Functionally flexible regions are defined as regions that can adopt different conformational states and are assumed to be necessary for bioactivity. Many advances have been made in understanding the relationship between protein sequence and structure. This work contributes to those efforts by making strides to understand the relationship between protein sequence and flexibility. A coarse-grained protein dynamic modeling approach was used to generate the dataset required for support vector machine training. We define our regions of interest based on the participation of residues in correlated large-scale fluctuations. Even with this structure-based approach to computationally define regions of functional flexibility, predictors successfully extract sequence-flexibility relationships that have been experimentally confirmed to be functionally important. Thus, a sequence-based tool to identify flexible regions important for protein function has been created. The ability to identify functional flexibility using a sequence based approach complements structure-based definitions and will be especially useful for the large majority of proteins with unknown structures. The methodology offers promise to identify structural genomics targets amenable to crystallization and the possibility to engineer more flexible or rigid regions within proteins to modify their bioactivity.

  17. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

    Science.gov (United States)

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-07-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.

  18. Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans.

    Science.gov (United States)

    Roy, Sourav; Kagda, Meenakshi; Judelson, Howard S

    2013-03-01

    Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures). Most of the putative stage-specific transcription factor binding sites (TFBSs) thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors.

  19. Structural and functional studies of a phosphatidic acid-binding antifungal plant defensin MtDef4: Identification of an RGFRRR motif governing fungal cell entry

    Energy Technology Data Exchange (ETDEWEB)

    Sagaram, Uma S.; El-Mounadi, Kaoutar; Buchko, Garry W.; Berg, Howard R.; Kaur, Jagdeep; Pandurangi, Raghoottama; Smith, Thomas J.; Shah, Dilip

    2013-12-04

    A highly conserved plant defensin MtDef4 potently inhibits the growth of a filamentous fungus Fusarium graminearum. MtDef4 is internalized by cells of F. graminearum. To determine its mechanism of fungal cell entry and antifungal action, NMR solution structure of MtDef4 has been determined. The analysis of its structure has revealed a positively charged patch on the surface of the protein consisting of arginine residues in its γ-core signature, a major determinant of the antifungal activity of MtDef4. Here, we report functional analysis of the RGFRRR motif of the γ-core signature of MtDef4. The replacement of RGFRRR to AAAARR or to RGFRAA not only abolishes fungal cell entry but also results in loss of the antifungal activity of MtDef4. MtDef4 binds strongly to phosphatidic acid (PA), a precursor for the biosynthesis of membrane phospholipids and a signaling lipid known to recruit cytosolic proteins to membranes. Mutations of RGFRRR which abolish fungal cell entry of MtDef4 also impair its binding to PA. Our results suggest that RGFRRR motif is a translocation signal for entry of MtDef4 into fungal cells and that this positively charged motif likely mediates interaction of this defensin with PA as part of its antifungal action.

  20. Automated discovery of tissue-targeting enhancers and transcription factors from binding motif and gene function data.

    Directory of Open Access Journals (Sweden)

    Geetu Tuteja

    2014-01-01

    Full Text Available Identifying enhancers regulating gene expression remains an important and challenging task. While recent sequencing-based methods provide epigenomic characteristics that correlate well with enhancer activity, it remains onerous to comprehensively identify all enhancers across development. Here we introduce a computational framework to identify tissue-specific enhancers evolving under purifying selection. First, we incorporate high-confidence binding site predictions with target gene functional enrichment analysis to identify transcription factors (TFs likely functioning in a particular context. We then search the genome for clusters of binding sites for these TFs, overcoming previous constraints associated with biased manual curation of TFs or enhancers. Applying our method to the placenta, we find 33 known and implicate 17 novel TFs in placental function, and discover 2,216 putative placenta enhancers. Using luciferase reporter assays, 31/36 (86% tested candidates drive activity in placental cells. Our predictions agree well with recent epigenomic data in human and mouse, yet over half our loci, including 7/8 (87% tested regions, are novel. Finally, we establish that our method is generalizable by applying it to 5 additional tissues: heart, pancreas, blood vessel, bone marrow, and liver.

  1. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

    Science.gov (United States)

    Quang, Daniel; Xie, Xiaohui

    2016-06-20

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ.

  2. Prediction Error During Functional and Non-Functional Action Sequences

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Sørensen, Jesper

    2013-01-01

    error. Non-functionality in this proximal sense is a feature of many socio-cultural practices, such as those found in religious rituals private and social, as well as pathological practices, such as ritualized behavior found among people suffering from Obsessive Compulsory Disorder (OCD). A recent...... behavioral study has shown that human subjects segment non-functional behavior in a more fine-grained way than functional behavior. This increase in segmentation rate implies that non-functionality elicits a stronger error signal. To further explore the implications, two computer simulations using simple......By means of the computational approach the present study investigates the difference between observation of functional behavior (i.e. actions involving necessary integration of subparts) and non-functional behavior (i.e. actions lacking necessary integration of subparts) in terms of prediction...

  3. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  4. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Directory of Open Access Journals (Sweden)

    Fauteux François

    2009-10-01

    Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination

  5. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Science.gov (United States)

    Fauteux, François; Strömvik, Martina V

    2009-01-01

    Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

  6. Helix-packing motifs in membrane proteins.

    Science.gov (United States)

    Walters, R F S; DeGrado, W F

    2006-09-12

    The fold of a helical membrane protein is largely determined by interactions between membrane-imbedded helices. To elucidate recurring helix-helix interaction motifs, we dissected the crystallographic structures of membrane proteins into a library of interacting helical pairs. The pairs were clustered according to their three-dimensional similarity (rmsd universe of common transmembrane helix-pairing motifs is relatively simple. The largest cluster, which comprises 29% of the library members, consists of an antiparallel motif with left-handed packing angles, and it is frequently stabilized by packing of small side chains occurring every seven residues in the sequence. Right-handed parallel and antiparallel structures show a similar tendency to segregate small residues to the helix-helix interface but spaced at four-residue intervals. Position-specific sequence propensities were derived for the most populated motifs. These structural and sequential motifs should be quite useful for the design and structural prediction of membrane proteins.

  7. Inaudible functional MRI using a truly mute gradient echo sequence

    Energy Technology Data Exchange (ETDEWEB)

    Marcar, V.L. [University of Zurich, Department of Psychology, Neuropsychology, Treichlerstrasse 10, 8032 Zurich (Switzerland); Girard, F. [GE Medical Systems SA, 283, rue de la Miniere B.P. 34, 78533 Buc Cedex (France); Rinkel, Y.; Schneider, J.F.; Martin, E. [University Children' s Hospital, Neuroradiology and Magnetic Resonance, Department of Diagnostic Imaging, Steinwiesstrasse 75, 8032 Zurich (Switzerland)

    2002-11-01

    We performed functional MRI experiments using a mute version of a gradient echo sequence on adult volunteers using either a simple visual stimulus (flicker goggles: 4 subjects) or an auditory stimulus (music: 4 subjects). Because the mute sequence delivers fewer images per unit time than a fast echo planar imaging (EPI) sequence, we explored our data using a parametric ANOVA test and a non-parametric Wilcoxon-Mann-Whitney test in addition to performing a cross-correlation analysis. All three methods were in close agreement regarding the location of the BOLD contrast signal change. We demonstrated that, using appropriate statistical analysis, functional MRI using an MR sequence that is acoustically inaudible to the subject is feasible. Furthermore compared with the ''silent'' event-related procedures involving an EPI protocol, our mGE protocol compares favourably with respect to experiment time and the BOLD signal. (orig.)

  8. On the concept of hemilability: insights into a donor-functionalized iridium(I) NHC motif and its impact on reactivity.

    Science.gov (United States)

    Riener, Korbinian; Bitzer, Mario J; Pöthig, Alexander; Raba, Andreas; Cokoja, Mirza; Herrmann, Wolfgang A; Kühn, Fritz E

    2014-12-15

    Novel iridium(I) complexes bearing N-donor-functionalized N-heterocyclic carbene ligands were synthesized. Although hemilabile coordination of the attached donor is considered beneficial in catalysis, no detailed study of this phenomenon in these systems is available to date. The present report provides insight into the hemilabile bonding properties of a N,N'-bis(pyridin-2-yl)-imidazolylidene (NCN) ligand motif on iridium(I). In most cases, the presented compounds exhibit rare fluxional hemilabile coordination of the N donor, and remarkable performance in catalytic transfer hydrogenation is observed. Further, extensive reactivity studies often led to unexpected products.

  9. A novel RNA motif mediates the strict nuclear localization of a long noncoding RNA.

    Science.gov (United States)

    Zhang, Bing; Gunawardane, Lalith; Niazi, Farshad; Jahanbani, Fereshteh; Chen, Xin; Valadkhan, Saba

    2014-06-01

    The ubiquitous presence of long noncoding RNAs (lncRNAs) in eukaryotes points to the importance of understanding how their sequences impact function. As many lncRNAs regulate nuclear events and thus must localize to nuclei, we analyzed the sequence requirements for nuclear localization in an intergenic lncRNA named BORG (BMP2-OP1-responsive gene), which is both spliced and polyadenylated but is strictly localized in nuclei. Subcellular localization of BORG was not dependent on the context or level of its expression or decay but rather depended on the sequence of the mature, spliced transcript. Mutational analyses indicated that nuclear localization of BORG was mediated through a novel RNA motif consisting of the pentamer sequence AGCCC with sequence restrictions at positions -8 (T or A) and -3 (G or C) relative to the first nucleotide of the pentamer. Mutation of the motif to a scrambled sequence resulted in complete loss of nuclear localization, while addition of even a single copy of the motif to a cytoplasmically localized RNA was sufficient to impart nuclear localization. Further, the presence of this motif in other cellular RNAs showed a direct correlation with nuclear localization, suggesting that the motif may act as a general nuclear localization signal for cellular RNAs. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  10. Intergenic regions of Borrelia plasmids contain phylogenetically conserved RNA secondary structure motifs

    Directory of Open Access Journals (Sweden)

    Delihas Nicholas

    2009-03-01

    Full Text Available Abstract Background Borrelia species are unusual in that they contain a large number of linear and circular plasmids. Many of these plasmids have long intergenic regions. These regions have many fragmented genes, repeated sequences and appear to be in a state of flux, but they may serve as reservoirs for evolutionary change and/or maintain stable motifs such as small RNA genes. Results In an in silico study, intergenic regions of Borrelia plasmids were scanned for phylogenetically conserved stem loop structures that may represent functional units at the RNA level. Five repeat sequences were found that could fold into stable RNA-type stem loop structures, three of which are closely linked to protein genes, one of which is a member of the Borrelia lipoprotein_1 super family genes and another is the complement regulator-acquiring surface protein_1 (CRASP-1 family. Modeled secondary structures of repeat sequences display numerous base-pair compensatory changes in stem regions, including C-G→A-U transversions when orthologous sequences are compared. Base-pair compensatory changes constitute strong evidence for phylogenetic conservation of secondary structure. Conclusion Intergenic regions of Borrelia species carry evolutionarily stable RNA secondary structure motifs. Of major interest is that some motifs are associated with protein genes that show large sequence variability. The cell may conserve these RNA motifs whereas allow a large flux in amino acid sequence, possibly to create new virulence factors but with associated RNA motifs intact.

  11. Sequence information encoded in DNA that may influence long-range chromatin structure correlates with human chromosome functions.

    Directory of Open Access Journals (Sweden)

    Taichi E Takasuka

    Full Text Available Little is known about the possible function of the bulk of the human genome. We have recently shown that long-range regular oscillation in the motif non-T, A/T, G (VWG existing at ten-nucleotide multiples influences large-scale nucleosome array formation. In this work, we have determined the locations of all 100 kb regions that are predicted to form distinctive chromatin structures throughout each human chromosome (except Y. Using these data, we found that a significantly greater fraction of 300 kb sequences lacked annotated transcripts in genomic DNA regions > or = 300 kb that contained nearly continuous chromatin organizing signals than in control regions. We also found a relationship between the meiotic recombination frequency and the presence of strong VWG chromatin organizing signals. Large (> or = 300 kb genomic DNA regions having low average recombination frequency are enriched in chromatin organizing signals. As additional controls, we show using chromosome 1 that the VWG motif signals are not enriched in randomly selected DNA regions having the mean size of the recombination coldspots, and that non-VWG motif sets do not generate signals that are enriched in recombination coldspots. We also show that tandemly repeated alpha satellite DNA contains strong VWG signals for the formation of distinctive nucleosome arrays, consistent with the low recombination activity of centromeres. Our correlations cannot be explained simply by variations in the GC content. Our findings suggest that a specific set of periodic DNA motifs encoded in genomic DNA, which provide signals for chromatin organization, influence human chromosome function.

  12. A Transmembrane Domain GGxxG Motif in CD4 Contributes to Its Lck-Independent Function but Does Not Mediate CD4 Dimerization.

    Directory of Open Access Journals (Sweden)

    Heather L Parrish

    Full Text Available CD4 interactions with class II major histocompatibility complex (MHC molecules are essential for CD4+ T cell development, activation, and effector functions. While its association with p56lck (Lck, a Src kinase, is important for these functions CD4 also has an Lck-independent role in TCR signaling that is incompletely understood. Here, we identify a conserved GGxxG motif in the CD4 transmembrane domain that is related to the previously described GxxxG motifs of other proteins and predicted to form a flat glycine patch in a transmembrane helix. In other proteins, these patches have been reported to mediate dimerization of transmembrane domains. Here we show that introducing bulky side-chains into this patch (GGxxG to GVxxL impairs the Lck-independent role of CD4 in T cell activation upon TCR engagement of agonist and weak agonist stimulation. However, using Forster's Resonance Energy Transfer (FRET, we saw no evidence that these mutations decreased CD4 dimerization either in the unliganded state or upon engagement of pMHC concomitantly with the TCR. This suggests that the CD4 transmembrane domain is either mediating interactions with an unidentified partner, or mediating some other function such as membrane domain localization that is important for its role in T cell activation.

  13. Dissection of thousands of cell type-specific enhancers identifies dinucleotide repeat motifs as general enhancer features.

    Science.gov (United States)

    Yáñez-Cuna, J Omar; Arnold, Cosmas D; Stampfel, Gerald; Boryń, Lukasz M; Gerlach, Daniel; Rath, Martina; Stark, Alexander

    2014-07-01

    Gene expression is determined by genomic elements called enhancers, which contain short motifs bound by different transcription factors (TFs). However, how enhancer sequences and TF motifs relate to enhancer activity is unknown, and general sequence requirements for enhancers or comprehensive sets of important enhancer sequence elements have remained elusive. Here, we computationally dissect thousands of functional enhancer sequences from three different Drosophila cell lines. We find that the enhancers display distinct cis-regulatory sequence signatures, which are predictive of the enhancers' cell type-specific or broad activities. These signatures contain transcription factor motifs and a novel class of enhancer sequence elements, dinucleotide repeat motifs (DRMs). DRMs are highly enriched in enhancers, particularly in enhancers that are broadly active across different cell types. We experimentally validate the importance of the identified TF motifs and DRMs for enhancer function and show that they can be sufficient to create an active enhancer de novo from a nonfunctional sequence. The function of DRMs as a novel class of general enhancer features that are also enriched in human regulatory regions might explain their implication in several diseases and provides important insights into gene regulation.

  14. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    Energy Technology Data Exchange (ETDEWEB)

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  15. Massively parallel interrogation of aptamer sequence, structure and function.

    Directory of Open Access Journals (Sweden)

    Nicholas O Fischer

    Full Text Available BACKGROUND: Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. METHODOLOGY/PRINCIPAL FINDINGS: High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. CONCLUSION AND SIGNIFICANCE: The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  16. Spontaneous processing of functional and non-functional action sequences

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Sørensen, Jesper

    2011-01-01

    have chosen to focus on the derivedness from instrumental behavior, intentional underspecification and goal-demotion. In contrast to instrumental or functional behavior (i.e., actions that cohere causally and have a necessary integration of subparts), we propose to view ritual and ritualized action......Characterizing ritual and ritualized behaviors has been a core issue in anthropology and the study of religion for more than a century. Although varying in emphasis, most theories point toward several specific behavioral features that distinguish ritual from instrumental behavior. Specifically, we...... as sub-categories of non-functional behavior (i.e., actions lacking causal coherence and a necessary integration between subparts). New insights in human action processing can help us explain how cognition might vary depending on the type of behavior processed. Using an event segmentation paradigm, we...

  17. Cis-motifs upstream of the transcription and translation initiation sites are effectively revealed by their positional disequilibrium in eukaryote genomes using frequency distribution curves

    Directory of Open Access Journals (Sweden)

    Harter Klaus

    2006-11-01

    Full Text Available Abstract Background The discovery of cis-regulatory motifs still remains a challenging task even though the number of sequenced genomes is constantly growing. Computational analyses using pattern search algorithms have been valuable in phylogenetic footprinting approaches as have expression profile experiments to predict co-occurring motifs. Surprisingly little is known about the nature of cis-regulatory element (CRE distribution in promoters. Results In this paper we used the Motif Mapper open-source collection of visual basic scripts for the analysis of motifs in any aligned set of DNA sequences. We focused on promoter motif distribution curves to identify positional over-representation of DNA motifs. Using differentially aligned datasets from the model species Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Saccharomyces cerevisiae, we convincingly demonstrated the importance of the position and orientation for motif discovery. Analysis with known CREs and all possible hexanucleotides showed that some functional elements gather close to the transcription and translation initiation sites and that elements other than the TATA-box motif are conserved between eukaryote promoters. While a high background frequency usually decreases the effectiveness of such an enumerative investigation, we improved our analysis by conducting motif distribution maps using large datasets. Conclusion This is the first study to reveal positional over-representation of CREs and promoter motifs in a cross-species approach. CREs and motifs shared between eukaryotic promoters support the observation that an eukaryotic promoter structure has been conserved throughout evolutionary time. Furthermore, with the information on positional enrichment of a motif or a known functional CRE, it is possible to get a more detailed insight into where an element appears to function. This in turn might accelerate the in depth examination of known and yet unknown

  18. Spontaneous processing of functional and non-functional action sequences

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Sørensen, Jesper

    2011-01-01

    Characterizing ritual and ritualized behaviors has been a core issue in anthropology and the study of religion for more than a century. Although varying in emphasis, most theories point toward several specific behavioral features that distinguish ritual from instrumental behavior. Specifically, we...... have chosen to focus on the derivedness from instrumental behavior, intentional underspecification and goal-demotion. In contrast to instrumental or functional behavior (i.e., actions that cohere causally and have a necessary integration of subparts), we propose to view ritual and ritualized action...

  19. The valine and lysine residues in the conserved FxVTxK motif are important for the function of phylogenetically distant plant cellulose synthases.

    Science.gov (United States)

    Slabaugh, Erin; Scavuzzo-Duggan, Tess; Chaves, Arielle; Wilson, Liza; Wilson, Carmen; Davis, Jonathan K; Cosgrove, Daniel J; Anderson, Charles T; Roberts, Alison W; Haigler, Candace H

    2016-05-01

    Cellulose synthases (CESAs) synthesize the β-1,4-glucan chains that coalesce to form cellulose microfibrils in plant cell walls. In addition to a large cytosolic (catalytic) domain, CESAs have eight predicted transmembrane helices (TMHs). However, analogous to the structure of BcsA, a bacterial CESA, predicted TMH5 in CESA may instead be an interfacial helix. This would place the conserved FxVTxK motif in the plant cell cytosol where it could function as a substrate-gating loop as occurs in BcsA. To define the functional importance of the CESA region containing FxVTxK, we tested five parallel mutations in Arabidopsis thaliana CESA1 and Physcomitrella patens CESA5 in complementation assays of the relevant cesa mutants. In both organisms, the substitution of the valine or lysine residues in FxVTxK severely affected CESA function. In Arabidopsis roots, both changes were correlated with lower cellulose anisotropy, as revealed by Pontamine Fast Scarlet. Analysis of hypocotyl inner cell wall layers by atomic force microscopy showed that two altered versions of Atcesa1 could rescue cell wall phenotypes observed in the mutant background line. Overall, the data show that the FxVTxK motif is functionally important in two phylogenetically distant plant CESAs. The results show that Physcomitrella provides an efficient model for assessing the effects of engineered CESA mutations affecting primary cell wall synthesis and that diverse testing systems can lead to nuanced insights into CESA structure-function relationships. Although CESA membrane topology needs to be experimentally determined, the results support the possibility that the FxVTxK region functions similarly in CESA and BcsA.

  20. LRRCE: a leucine-rich repeat cysteine capping motif unique to the chordate lineage

    Directory of Open Access Journals (Sweden)

    Bishop Paul N

    2008-12-01

    Full Text Available Abstract Background The small leucine-rich repeat proteins and proteoglycans (SLRPs form an important family of regulatory molecules that participate in many essential functions. They typically control the correct assembly of collagen fibrils, regulate mineral deposition in bone, and modulate the activity of potent cellular growth factors through many signalling cascades. SLRPs belong to the group of extracellular leucine-rich repeat proteins that are flanked at both ends by disulphide-bonded caps that protect the hydrophobic core of the terminal repeats. A capping motif specific to SLRPs has been recently described in the crystal structures of the core proteins of decorin and biglycan. This motif, designated as LRRCE, differs in both sequence and structure from other, more widespread leucine-rich capping motifs. To investigate if the LRRCE motif is a common structural feature found in other leucine-rich repeat proteins, we have defined characteristic sequence patterns and used them in genome-wide searches. Results The LRRCE motif is a structural element exclusive to the main group of SLRPs. It appears to have evolved during early chordate evolution and is not found in protein sequences from non-chordate genomes. Our search has expanded the family of SLRPs to include new predicted protein sequences, mainly in fishes but with intriguing putative orthologs in mammals. The chromosomal locations of the newly predicted SLRP genes would support the large-scale genome or gene duplications that are thought to have occurred during vertebrate evolution. From this expanded list we describe a new class of SLRP sequences that could be representative of an ancestral SLRP gene. Conclusion Given its exclusivity the LRRCE motif is a useful annotation tool for the identification and classification of new SLRP sequences in genome databases. The expanded list of members of the SLRP family offers interesting insights into early vertebrate evolution and suggests an

  1. Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans.

    Directory of Open Access Journals (Sweden)

    Sourav Roy

    2013-03-01

    Full Text Available Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures. Most of the putative stage-specific transcription factor binding sites (TFBSs thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors.

  2. High-resolution mapping of protein sequence-function relationships.

    Science.gov (United States)

    Fowler, Douglas M; Araya, Carlos L; Fleishman, Sarel J; Kellogg, Elizabeth H; Stephany, Jason J; Baker, David; Fields, Stanley

    2010-09-01

    We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.

  3. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas.

    Science.gov (United States)

    Petrov, Anton I; Zirbel, Craig L; Leontis, Neocles B

    2013-10-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson-Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.

  4. Transfinite Sequences of Continuous and Baire Class 1 Functions

    CERN Document Server

    Elekes, Márton

    2011-01-01

    The set of continuous or Baire class 1 functions defined on a metric space $X$ is endowed with the natural pointwise partial order. We investigate how the possible lengths of well-ordered monotone sequences (with respect to this order) depend on the space $X$.

  5. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  6. Main: TCA1MOTIF [PLACE

    Lifescience Database Archive (English)

    Full Text Available TCA1MOTIF S000159 17-May-1998 (last modified) kehi TCA-1 (tobacco nuclear protein 1...) binding site; Related to salicylic acid-inducible expression of many genes; Found in barley beta-1,3-gluca...nase and over 30 different plant genes which are known to be induced by one or more forms of stress; A similar sequence (TCA... et al., 1997); SA; salicylic acid; stress; TCA-1; barley (Hordeum vulgare); tobacco (Nicotiana tabacum); TCATCTTCTT ...

  7. A Multi-Functional Gene Family From Arthritis to Cancer: A Disintegrin-Like Metalloproteinase with Thrombospondin Type-1 Motif (ADAMTS

    Directory of Open Access Journals (Sweden)

    Kadir Demircan

    2012-09-01

    Full Text Available A Disintegrin-like and Metalloproteinase with Trombospondin type-1 motif (ADAMTS genes were first discovered in 1997. Currently 19 mammalian ADAMTS proteainases have identified. As a member of matrix metalloproteinases, ADAMTS play a critical role in the degradation/repairing of extracellular matrix. Recent studies demonstrated that ADAMTSs were likely to be useful in understanding of many disease pathogenesis such as arthritis, liver fibrosis and cancer. Therefore, it is important to understand molecular organization and function of ADAMTSs. The objective of this review is to assist the better understanding of the structure, function and contributions of ADAMTSs on the related disease pathogenesis. Especially, understanding of the ADAMTSs roles in the pathogenesis of diseases may lead new diagnostic approaches and development of specific therapeutic agents.

  8. MAR characteristic motifs mediate episomal vector in CHO cells.

    Science.gov (United States)

    Lin, Yan; Li, Zhaoxi; Wang, Tianyun; Wang, Xiaoyin; Wang, Li; Dong, Weihua; Jing, Changqin; Yang, Xianjun

    2015-04-01

    An ideal gene therapy vector should enable persistent transgene expression without limitations in safety and reproducibility. Recent researches' insight into the ability of chromosomal matrix attachment regions (MARs) to mediate episomal maintenance of genetic elements allowed the development of a circular episomal vector. Although a MAR-mediated engineered vector has been developed, little is known on which motifs of MAR confer this function during interaction with the host genome. Here, we report an artificially synthesized DNA fragment containing only characteristic motif sequences that served as an alternative to human beta-interferon matrix attachment region sequence. The potential of the vector to mediate gene transfer in CHO cells was investigated. The short synthetic MAR motifs were found to mediate episomal vector at a low copy number for many generations without integration into the host genome. Higher transgene expression was maintained for at least 4 months. In addition, MAR was maintained episomally and conferred sustained EGFP expression even in nonselective CHO cells. All the results demonstrated that MAR characteristic sequence-based vector can function as stable episomes in CHO cells, supporting long-term and effective transgene expression.

  9. Mining dynamic noteworthy functions in software execution sequences.

    Science.gov (United States)

    Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.

  10. Mining dynamic noteworthy functions in software execution sequences

    Science.gov (United States)

    Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely. PMID:28278276

  11. Whole-genome sequence-based analysis of thyroid function

    OpenAIRE

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 1...

  12. Two functional motifs define the interaction, internalization and toxicity of the cell-penetrating antifungal peptide PAF26 on fungal cells.

    Directory of Open Access Journals (Sweden)

    Alberto Muñoz

    Full Text Available The synthetic, cell penetrating hexapeptide PAF26 (RKKWFW is antifungal at low micromolar concentrations and has been proposed as a model for cationic, cell-penetrating antifungal peptides. Its short amino acid sequence facilitates the analysis of its structure-activity relationships using the fungal models Neurospora crassa and Saccharomyces cerevisiae, and human and plant pathogens Aspergillus fumigatus and Penicillium digitatum, respectively. Previously, PAF26 at low fungicidal concentrations was shown to be endocytically internalized, accumulated in vacuoles and then actively transported into the cytoplasm where it exerts its antifungal activity. In the present study, two PAF26 derivatives, PAF95 (AAAWFW and PAF96 (RKKAAA, were designed to characterize the roles of the N-terminal cationic and the C-terminal hydrophobic motifs in PAF26's mode-of-action. PAF95 and PAF96 exhibited substantially reduced antifungal activity against all the fungi analyzed. PAF96 localized to fungal cell envelopes and was not internalized by the fungi. In contrast, PAF95 was taken up into vacuoles of N. crassa, wherein it accumulated and was trapped without toxic effects. Also, the PAF26 resistant Δarg1 strain of S. cerevisiae exhibited increased PAF26 accumulation in vacuoles. Live-cell imaging of GFP-labelled nuclei in A. fumigatus showed that transport of PAF26 from the vacuole to the cytoplasm was followed by nuclear breakdown and dissolution. This work demonstrates that the amphipathic PAF26 possesses two distinct motifs that allow three stages in its antifungal action to be defined: (i its interaction with the cell envelope; (ii its internalization and transport to vacuoles mediated by the aromatic hydrophobic domain; and (iii its transport from vacuoles to the cytoplasm. Significantly, cationic residues in PAF26 are important not only for the electrostatic attraction and interaction with the fungal cell but also for transport from the vacuole to the

  13. Minimal motif peptide structure of metzincin clan zinc peptidases in micelles.

    Science.gov (United States)

    Onoda, Akira; Suzuki, Takako; Ishizuka, Hiroaki; Sugiyama, Rumiko; Ariyasu, Shinya; Yamamura, Takeshi

    2009-12-01

    It is well known that the functions of metalloproteins generally originate from their metal-binding motifs. However, the intrinsic nature of individual motifs remains unknown, particularly the details about metal-binding effects on the folding of motifs; the converse is also unknown, although there is no doubt that the motif is the core of the reactivity for each metalloprotein. In this study, we focused our attention on the zinc-binding motif of the metzincin clan family, HEXXHXXGXXH; this family contains the general zinc-binding sequence His-Glu-Xaa-Xaa-His (HEXXH) and the extended GXXH region. We adopted the motif sequence of stromelysin-1 and investigated the folding properties of the Trp-labeled peptides WAHEIAHSLGLFHA (STR-W1), AWHEIAHSLGLFHA (STR-W2), AHEIAHSLGWFHA (STR-W11), and AHEIAHSLGLFHWA (STR-W14) in the presence and absence of zinc ions in hydrophobic micellar environments by circular dichroism (CD) measurements. We accessed successful incorporation of these zinc peptides into micelles using quenching of Trp fluorescence. Results of CD studies indicated that two of the Trp-incorporated peptides, STR-W1 and STR-W14, exhibited helical folding in the hydrophobic region of cetyltrimethylammonium chloride micelle. The NMR structural analysis of the apo STR-W14 revealed that the conformation in the C-terminus GXXH region significantly differred between the apo state in the micelle and the reported Zn-bound state of stromelysin-1 in crystal structures. The structural analyses of the qualitative Zn-binding properties of this motif peptide provide an interesting Zn-binding mechanism: the minimum consensus motif in the metzincin clan, a basic zinc-binding motif with an extended GXXH region, has the potential to serve as a preorganized Zn binding scaffold in a hydrophobic environment.

  14. Fitting a mixture model by expectation maximization to discover motifs in biopolymers

    Energy Technology Data Exchange (ETDEWEB)

    Bailey, T.L.; Elkan, C. [Univ. of California, La Jolla, CA (United States)

    1994-12-31

    The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs. The algorithm requires only a set of unaligned sequences and a number specifying the width of the motifs as input. It returns a model of each motif and a threshold which together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases. The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset.

  15. Bioinformatics Study of Cancer-Related Mutations within p53 Phosphorylation Site Motifs

    Directory of Open Access Journals (Sweden)

    Xiaona Ji

    2014-07-01

    Full Text Available p53 protein has about thirty phosphorylation sites located at the N- and C-termini and in the core domain. The phosphorylation sites are relatively less mutated than other residues in p53. To understand why and how p53 phosphorylation sites are rarely mutated in human cancer, using a bioinformatics approaches, we examined the phosphorylation site and its nearby flanking residues, focusing on the consensus phosphorylation motif pattern, amino-acid correlations within the phosphorylation motifs, the propensity of structural disorder of the phosphorylation motifs, and cancer mutations observed within the phosphorylation motifs. Many p53 phosphorylation sites are targets for several kinases. The phosphorylation sites match 17 consensus sequence motifs out of the 29 classified. In addition to proline, which is common in kinase specificity-determining sites, we found high propensity of acidic residues to be adjacent to phosphorylation sites. Analysis of human cancer mutations in the phosphorylation motifs revealed that motifs with adjacent acidic residues generally have fewer mutations, in contrast to phosphorylation sites near proline residues. p53 phosphorylation motifs are mostly disordered. However, human cancer mutations within phosphorylation motifs tend to decrease the disorder propensity. Our results suggest that combination of acidic residues Asp and Glu with phosphorylation sites provide charge redundancy which may safe guard against loss-of-function mutations, and that the natively disordered nature of p53 phosphorylation motifs may help reduce mutational damage. Our results further suggest that engineering acidic amino acids adjacent to potential phosphorylation sites could be a p53 gene therapy strategy.

  16. Assessment of composite motif discovery methods

    Directory of Open Access Journals (Sweden)

    Johansen Jostein

    2008-02-01

    Full Text Available Abstract Background Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery – discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. Results We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Conclusion Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual

  17. VARUN: discovering extensible motifs under saturation constraints.

    Science.gov (United States)

    Apostolico, Alberto; Comin, Matteo; Parida, Laxmi

    2010-01-01

    The discovery of motifs in biosequences is frequently torn between the rigidity of the model on one hand and the abundance of candidates on the other hand. In particular, motifs that include wild cards or "don't cares" escalate exponentially with their number, and this gets only worse if a don't care is allowed to stretch up to some prescribed maximum length. In this paper, a notion of extensible motif in a sequence is introduced and studied, which tightly combines the structure of the motif pattern, as described by its syntactic specification, with the statistical measure of its occurrence count. It is shown that a combination of appropriate saturation conditions and the monotonicity of probabilistic scores over regions of constant frequency afford us significant parsimony in the generation and testing of candidate overrepresented motifs. A suite of software programs called Varun is described, implementing the discovery of extensible motifs of the type considered. The merits of the method are then documented by results obtained in a variety of experiments primarily targeting protein sequence families. Of equal importance seems the fact that the sets of all surprising motifs returned in each experiment are extracted faster and come in much more manageable sizes than would be obtained in the absence of saturation constraints.

  18. qPMS9: An Efficient Algorithm for Quorum Planted Motif Search

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2015-01-01

    Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (l, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers l and d. It returns all sequences M of length l that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (l, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.

  19. The DNA-binding domain of BenM reveals the structural basis for the recognition of a T-N11-A sequence motif by LysR-type transcriptional regulators.

    Science.gov (United States)

    Alanazi, Amer M; Neidle, Ellen L; Momany, Cory

    2013-10-01

    LysR-type transcriptional regulators (LTTRs) play critical roles in metabolism and constitute the largest family of bacterial regulators. To understand protein-DNA interactions, atomic structures of the DNA-binding domain and linker-helix regions of a prototypical LTTR, BenM, were determined by X-ray crystallography. BenM structures with and without bound DNA reveal a set of highly conserved amino acids that interact directly with DNA bases. At the N-terminal end of the recognition helix (α3) of a winged-helix-turn-helix DNA-binding motif, several residues create hydrophobic pockets (Pro30, Pro31 and Ser33). These pockets interact with the methyl groups of two thymines in the DNA-recognition motif and its complementary strand, T-N11-A. This motif usually includes some dyad symmetry, as exemplified by a sequence that binds two subunits of a BenM tetramer (ATAC-N7-GTAT). Gln29 forms hydrogen bonds to adenine in the first position of the recognition half-site (ATAC). Another hydrophobic pocket defined by Ala28, Pro30 and Pro31 interacts with the methyl group of thymine, complementary to the base at the third position of the half-site. Arg34 interacts with the complementary base of the 3' position. Arg53, in the wing, provides AT-tract recognition in the minor groove. For DNA recognition, LTTRs use highly conserved interactions between amino acids and nucleotide bases as well as numerous less-conserved secondary interactions.

  20. DNA regulatory motif selection based on support vector machine ...

    African Journals Online (AJOL)

    DNA regulatory motif selection based on support vector machine (SVM) and its application in microarray ... African Journal of Biotechnology ... experiments to explore the underlying relationships between motif types and gene functions.

  1. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Directory of Open Access Journals (Sweden)

    Pooya Zandevakili

    Full Text Available Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  2. deFUME: Dynamic exploration of functional metagenomic sequencing data

    DEFF Research Database (Denmark)

    van der Helm, Eric; Geertz-Hansen, Henrik Marcus; Genee, Hans Jasper

    2015-01-01

    Functional metagenomic selections represent a powerful technique that is widely applied for identification of novel genes from complex metagenomic sources. However, whereas hundreds to thousands of clones can be easily generated and sequenced over a few days of experiments, analyzing the data...... to a comprehensive visual data overview that facilitates effortless inspection of gene function, clustering and distribution. The webserver is available at cbs.dtu.dk/services/deFUME/and the source code is distributed at github.com/EvdH0/deFUME....

  3. Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif

    Science.gov (United States)

    2010-01-01

    Background Effector secretion is a common strategy of pathogen in mediating host-pathogen interaction. Eight EPIYA-motif containing effectors have recently been discovered in six pathogens. Once these effectors enter host cells through type III/IV secretion systems (T3SS/T4SS), tyrosine in the EPIYA motif is phosphorylated, which triggers effectors binding other proteins to manipulate host-cell functions. The objectives of this study are to evaluate the distribution pattern of EPIYA motif in broad biological species, to predict potential effectors with EPIYA motif, and to suggest roles and biological functions of potential effectors in host-pathogen interactions. Results A hidden Markov model (HMM) of five amino acids was built for the EPIYA-motif based on the eight known effectors. Using this HMM to search the non-redundant protein database containing 9,216,047 sequences, we obtained 107,231 sequences with at least one EPIYA motif occurrence and 3115 sequences with multiple repeats of the EPIYA motif. Although the EPIYA motif exists among broad species, it is significantly over-represented in some particular groups of species. For those proteins containing at least four copies of EPIYA motif, most of them are from intracellular bacteria, extracellular bacteria with T3SS or T4SS or intracellular protozoan parasites. By combining the EPIYA motif and the adjacent SH2 binding motifs (KK, R4, Tarp and Tir), we built HMMs of nine amino acids and predicted many potential effectors in bacteria and protista by the HMMs. Some potential effectors for pathogens (such as Lawsonia intracellularis, Plasmodium falciparum and Leishmania major) are suggested. Conclusions Our study indicates that the EPIYA motif may be a ubiquitous functional site for effectors that play an important pathogenicity role in mediating host-pathogen interactions. We suggest that some intracellular protozoan parasites could secrete EPIYA-motif containing effectors through secretion systems similar to the

  4. Learning "graph-mer" motifs that predict gene expression trajectories in development.

    Directory of Open Access Journals (Sweden)

    Xuejing Li

    2010-04-01

    Full Text Available A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS regression to learn sequence patterns--represented by graphs of k-mers, or "graph-mers"--that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.

  5. Functionalized nanopore-embedded electrodes for rapid DNA sequencing

    CERN Document Server

    He, Haiying; Pandey, Ravindra; Rocha, Alexandre Reily; Sanvito, Stefano; Grigoriev, Anton; Ahuja, Rajeev; Karna, Shashi P

    2007-01-01

    The determination of a patient's DNA sequence can, in principle, reveal an increased risk to fall ill with particular diseases [1,2] and help to design "personalized medicine" [3]. Moreover, statistical studies and comparison of genomes [4] of a large number of individuals are crucial for the analysis of mutations [5] and hereditary diseases, paving the way to preventive medicine [6]. DNA sequencing is, however, currently still a vastly time-consuming and very expensive task [4], consisting of pre-processing steps, the actual sequencing using the Sanger method, and post-processing in the form of data analysis [7]. Here we propose a new approach that relies on functionalized nanopore-embedded electrodes to achieve an unambiguous distinction of the four nucleic acid bases in the DNA sequencing process. This represents a significant improvement over previously studied designs [8,9] which cannot reliably distinguish all four bases of DNA. The transport properties of the setup investigated by us, employing state-o...

  6. Molecular and functional characterization of a CS1 (CRACC) splice variant expressed in human NK cells that does not contain immunoreceptor tyrosine-based switch motifs.

    Science.gov (United States)

    Lee, Jae Kyung; Boles, Kent S; Mathew, Porunelloor A

    2004-10-01

    CS1 (CRACC, novel Ly9) is a novel member of the CD2 family expressed on natural killer (NK), T and stimulated B cells. Although the cytoplasmic domain of CS1 contains immunoreceptor tyrosine-based switch motifs (ITSM), which enables to recruite signaling lymphocyte activation molecule (SLAM)-associated protein (SAP/SH2D1A), it activates NK cells in the absence of a functional SAP. CS1 is a self ligand and homophilic interaction of CS1 regulates NK cell cytolytic activity. Here we have identified a novel splice variant of CS1 (CS1-S), which lacks ITSM. Human NK cells express mRNA for both wild-type CS1 (CS1-L) and CS1-S and their expression level remained steady upon various stimulations. To determine the function of each isoform, cDNA for CS1-L and CS1-S were transfected into the rat NK cell line RNK-16 and functionally tested using redirected cytotoxicity assays and calcium flux experiments. CS1-L was able to mediate redirected cytotoxicity of P815 target cells in the presence of monoclonal antibody against CS1 and a rise in intracellular calcium within RNK-16 cells, suggesting that CS1-L is an activating receptor, whereas CS1-S showed no effects. Interestingly, SAP associated with unstimulated CS1-L and dissociated upon pervanadate stimulation. These results indicate that CS1-L and CS1-S may differentially regulate human NK cell functions.

  7. Sequence-specific cleavage of BM2 gene transcript of influenza B virus by 10-23 catalytic motif containing DNA enzymes significantly inhibits viral RNA translation and replication.

    Science.gov (United States)

    Kumar, Binod; Kumar, Prashant; Rajput, Roopali; Saxena, Latika; Daga, Mradul K; Khanna, Madhu

    2013-10-01

    One of the hallmarks of progression of influenza virus replication is the step involving the virus uncoating that occurs in the host cytoplasm. The BM2 ion channel protein of influenza B virus is highly conserved and is essentially required during the uncoating processes of virus, thus an attractive target for designing antiviral drugs. We screened several DNA enzymes (Dzs) containing the 10-23 catalytic motif against the influenza B virus BM2 RNA. Dzs directed against the predicted single-stranded bulge regions showed sequence-specific cleavage activities. The Dz209 not only showed significant intracellular reduction of BM2 gene expression in transient-expression system but also provided considerable protection against influenza B virus challenge in MDCK cells. Our findings suggest that the Dz molecule can be used as selective and effective inhibitor of viral RNA replication, and can be explored further for development of a potent therapeutic agent against influenza B virus infection.

  8. Using SCOPE to identify potential regulatory motifs in coregulated genes.

    Science.gov (United States)

    Martyanov, Viktor; Gross, Robert H

    2011-05-31

    SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data. In this article, we utilize a web version of SCOPE to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs and has been used in other studies. The three algorithms that comprise SCOPE are BEAM, which finds non-degenerate motifs (ACCGGT), PRISM, which finds degenerate motifs (ASCGWT), and SPACER, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well. Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor. Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run. Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from

  9. No tradeoff between versatility and robustness in gene circuit motifs

    Science.gov (United States)

    Payne, Joshua L.

    2016-05-01

    Circuit motifs are small directed subgraphs that appear in real-world networks significantly more often than in randomized networks. In the Boolean model of gene circuits, most motifs are realized by multiple circuit genotypes. Each of a motif's constituent circuit genotypes may have one or more functions, which are embodied in the expression patterns the circuit forms in response to specific initial conditions. Recent enumeration of a space of nearly 17 million three-gene circuit genotypes revealed that all circuit motifs have more than one function, with the number of functions per motif ranging from 12 to nearly 30,000. This indicates that some motifs are more functionally versatile than others. However, the individual circuit genotypes that constitute each motif are less robust to mutation if they have many functions, hinting that functionally versatile motifs may be less robust to mutation than motifs with few functions. Here, I explore the relationship between versatility and robustness in circuit motifs, demonstrating that functionally versatile motifs are robust to mutation despite the inherent tradeoff between versatility and robustness at the level of an individual circuit genotype.

  10. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome

    Directory of Open Access Journals (Sweden)

    Santosh K. Tiwari

    2011-01-01

    Full Text Available The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs, in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0 software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5 software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  11. Model peptide studies of sequence regions in the elastomeric biomineralization protein, Lustrin A. I. The C-domain consensus-PG-, -NVNCT-motif.

    Science.gov (United States)

    Zhang, Bo; Wustman, Brandon A; Morse, Daniel; Evans, John Spencer

    2002-05-01

    The lustrin superfamily represents a unique group of biomineralization proteins localized between layered aragonite mineral plates (i.e., nacre layer) in mollusk shell. Recent atomic force microscopy (AFM) pulling studies have demonstrated that the lustrin-containing organic nacre layer in the abalone, Haliotis rufescens, exhibits a typical sawtooth force-extension curve with hysteretic recovery. This force extension behavior is reminiscent of reversible unfolding and refolding in elastomeric proteins such as titin and tenascin. Since secondary structure plays an important role in force-induced protein unfolding and refolding, the question is, What secondary structure(s) exist within the major domains of Lustrin A? Using a model peptide (FPGKNVNCTSGE) representing the 12-residue consensus sequence found near the N-termini of the first eight cysteine-rich domains (C-domains) within the Lustrin A protein, we employed CD, NMR spectroscopy, and simulated annealing/minimization to determine the secondary structure preferences for this sequence. At pH 7.4, we find that the 12-mer sequence adopts a loop conformation, consisting of a "bend" or "turn" involving residues G3-K4 and N7-C8-T9, with extended conformations arising at F1-G3; K4-V6; T9-S10-G11 in the sequence. Minor pH-dependent conformational effects were noted for this peptide; however, there is no evidence for a salt-bridge interaction between the K4 and E12 side chains. The presence of a loop conformation within the highly conserved -PG-, -NVNCT- sequence of C1-C8 domains may have important structural and mechanistic implications for the Lustrin A protein with regard to elastic behavior.

  12. Import of desired nucleic acid sequences using addressing motif of mitochondrial ribosomal 5S-rRNA for fluorescent in vivo hybridization of mitochondrial DNA and RNA.

    Science.gov (United States)

    Zelenka, Jaroslav; Alán, Lukáš; Jabůrek, Martin; Ježek, Petr

    2014-04-01

    Based on the matrix-addressing sequence of mitochondrial ribosomal 5S-rRNA (termed MAM), which is naturally imported into mitochondria, we have constructed an import system for in vivo targeting of mitochondrial DNA (mtDNA) or mt-mRNA, in order to provide fluorescence hybridization of the desired sequences. Thus DNA oligonucleotides were constructed, containing the 5'-flanked T7 RNA polymerase promoter. After in vitro transcription and fluorescent labeling with Alexa Fluor(®) 488 or 647 dye, we obtained the fluorescent "L-ND5 probe" containing MAM and exemplar cargo, i.e., annealing sequence to a short portion of ND5 mRNA and to the light-strand mtDNA complementary to the heavy strand nd5 mt gene (5'-end 21 base pair sequence). For mitochondrial in vivo fluorescent hybridization, HepG2 cells were treated with dequalinium micelles, containing the fluorescent probes, bringing the probes proximally to the mitochondrial outer membrane and to the natural import system. A verification of import into the mitochondrial matrix of cultured HepG2 cells was provided by confocal microscopy colocalizations. Transfections using lipofectamine or probes without 5S-rRNA addressing MAM sequence or with MAM only were ineffective. Alternatively, the same DNA oligonucleotides with 5'-CACC overhang (substituting T7 promoter) were transcribed from the tetracycline-inducible pENTRH1/TO vector in human embryonic kidney T-REx®-293 cells, while mitochondrial matrix localization after import of the resulting unlabeled RNA was detected by PCR. The MAM-containing probe was then enriched by three-order of magnitude over the natural ND5 mRNA in the mitochondrial matrix. In conclusion, we present a proof-of-principle for mitochondrial in vivo hybridization and mitochondrial nucleic acid import.

  13. A highly conserved sequence in the 3'-untranslated region of the drosophila Adh gene plays a functional role in Adh expression.

    Science.gov (United States)

    Parsch, J; Stephan, W; Tanda, S

    1999-01-01

    Phylogenetic analysis identified a highly conserved eight-base sequence (AAGGCTGA) within the 3'-untranslated region (UTR) of the Drosophila alcohol dehydrogenase gene, Adh. To examine the functional significance of this conserved motif, we performed in vitro deletion mutagenesis on the D. melanogaster Adh gene followed by P-element-mediated germline transformation. Deletion of all or part of the eight-base sequence leads to a twofold increase in in vivo ADH enzymatic activity. The increase in activity is temporally and spatially general and is the result of an underlying increase in Adh transcript. These results indicate that the conserved 3'-UTR motif plays a functional role in the negative regulation of Adh gene expression. The evolutionary significance of our results may be understood in the context of the amino acid change that produces the ADH-F allele and also leads to a twofold increase in ADH activity. While there is compelling evidence that the amino acid replacement has been a target of positive selection, the conservation of the 3'-UTR sequence suggests that it is under strong purifying selection. The selective difference between these two sequence changes, which have similar effects on ADH activity, may be explained by different metabolic costs associated with the increase in activity. PMID:9927459

  14. Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes

    Directory of Open Access Journals (Sweden)

    Kistler Corby

    2010-03-01

    Full Text Available Abstract Background Fusarium graminearum (Fg, a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study. Results Applying comparative genomics approaches, we developed a computational pipeline to systematically discover evolutionarily conserved regulatory motifs in the promoter, downstream and the intronic regions of Fg genes, based on the multiple alignments of sequenced Fusarium genomes. Using this method, we discovered 73 candidate regulatory motifs in the promoter regions. Nearly 30% of these motifs are highly enriched in promoter regions of Fg genes that are associated with a specific functional category. Through comparison to Saccharomyces cerevisiae (Sc and Schizosaccharomyces pombe (Sp, we observed conservation of transcription factors (TFs, their binding sites and the target genes regulated by these TFs related to pathways known to respond to stress conditions or phosphate metabolism. In addition, this study revealed 69 and 39 conserved motifs in the downstream regions and the intronic regions, respectively, of Fg genes. The top intronic motif is the splice donor site. For the downstream regions, we noticed an intriguing absence of the mammalian and Sc poly-adenylation signals among the list of conserved motifs. Conclusion This study provides the first comprehensive list of candidate regulatory motifs in Fg, and underscores the power of comparative genomics in revealing functional elements among related genomes. The conservation of regulatory pathways among the Fusarium genomes and the two yeast species reveals their functional significance, and provides new insights in their

  15. Sequence-based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families.

    Directory of Open Access Journals (Sweden)

    Janine Maimanakos

    2016-08-01

    Full Text Available Arylmalonate-Decarboxylases (AMDases, EC 4.1.1.76 are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta- and Gammaproteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the TTT family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99% of the (R-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes.

  16. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    Science.gov (United States)

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  17. Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites.

    Directory of Open Access Journals (Sweden)

    Tzong-Yi Lee

    Full Text Available Ubiquitin (Ub is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3 enzymes. Three major enzymes participate in ubiquitin conjugation. They are E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF network to identify protein ubiquitin conjugation (ubiquitylation sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (-20∼+20 revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information, which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence

  18. Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites.

    Science.gov (United States)

    Lee, Tzong-Yi; Chen, Shu-An; Hung, Hsin-Yi; Ou, Yu-Yen

    2011-03-09

    Ubiquitin (Ub) is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3) enzymes. Three major enzymes participate in ubiquitin conjugation. They are E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF) network to identify protein ubiquitin conjugation (ubiquitylation) sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub) sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (-20∼+20) revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information), which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence features of Ub

  19. Subtle Changes in Motif Positioning Cause Tissue-Specific Effects on Robustness of an Enhancer's Activity

    Science.gov (United States)

    Erceg, Jelena; Saunders, Timothy E.; Girardot, Charles; Devos, Damien P.; Hufnagel, Lars; Furlong, Eileen E. M.

    2014-01-01

    Deciphering the specific contribution of individual motifs within cis-regulatory modules (CRMs) is crucial to understanding how gene expression is regulated and how this process is affected by sequence variation. But despite vast improvements in the ability to identify where transcription factors (TFs) bind throughout the genome, we are limited in our ability to relate information on motif occupancy to function from sequence alone. Here, we engineered 63 synthetic CRMs to systematically assess the relationship between variation in the content and spacing of motifs within CRMs to CRM activity during development using Drosophila transgenic embryos. In over half the cases, very simple elements containing only one or two types of TF binding motifs were capable of driving specific spatio-temporal patterns during development. Different motif organizations provide different degrees of robustness to enhancer activity, ranging from binary on-off responses to more subtle effects including embryo-to-embryo and within-embryo variation. By quantifying the effects of subtle changes in motif organization, we were able to model biophysical rules that explain CRM behavior and may contribute to the spatial positioning of CRM activity in vivo. For the same enhancer, the effects of small differences in motif positions varied in developmentally related tissues, suggesting that gene expression may be more susceptible to sequence variation in one tissue compared to another. This result has important implications for human eQTL studies in which many associated mutations are found in cis-regulatory regions, though the mechanism for how they affect tissue-specific gene expression is often not understood. PMID:24391522

  20. Sequence and domain conservation of the coelacanth Hsp40 and Hsp90 chaperones suggests conservation of function.

    Science.gov (United States)

    Bishop, Özlem Tastan; Edkins, Adrienne Lesley; Blatch, Gregory Lloyd

    2014-09-01

    Molecular chaperones and their associated co-chaperones play an important role in preserving and regulating the active conformational state of cellular proteins. The chaperone complement of the Indonesian Coelacanth, Latimeria menadoensis, was elucidated using transcriptomic sequences. Heat shock protein 90 (Hsp90) and heat shock protein 40 (Hsp40) chaperones, and associated co-chaperones were focused on, and homologous human sequences were used to search the sequence databases. Coelacanth homologs of the cytosolic, mitochondrial and endoplasmic reticulum (ER) homologs of human Hsp90 were identified, as well as all of the major co-chaperones of the cytosolic isoform. Most of the human Hsp40s were found to have coelacanth homologs, and the data suggested that all of the chaperone machinery for protein folding at the ribosome, protein translocation to cellular compartments such as the ER and protein degradation were conserved. Some interesting similarities and differences were identified when interrogating human, mouse, and zebrafish homologs. For example, DnaJB13 is predicted to be a non-functional Hsp40 in humans, mouse, and zebrafish due to a corrupted histidine-proline-aspartic acid (HPD) motif, while the coelacanth homolog has an intact HPD. These and other comparisons enabled important functional and evolutionary questions to be posed for future experimental studies.

  1. Molecular diversity of LysM carbohydrate-binding motifs in fungi.

    Science.gov (United States)

    Akcapinar, Gunseli Bayram; Kappel, Lisa; Sezerman, Osman Ugur; Seidl-Seiboth, Verena

    2015-05-01

    LysM motifs are carbohydrate-binding modules found in prokaryotes and eukaryotes. They bind to N-acetylglucosamine-containing carbohydrates, such as chitin, chitio-oligosaccharides and peptidoglycan. In this review, we summarize the features of the protein architecture of LysM-containing proteins in fungi and discuss their so far known biochemical properties, transcriptional profiles and biological functions. Further, based on data from evolutionary analyses and consensus pattern profiling of fungal LysM motifs, we show that they can be classified into a fungal-specific group and a fungal/bacterial group. This facilitates the classification and selection of further LysM proteins for detailed analyses and will contribute to widening our understanding of the functional spectrum of this protein family in fungi. Fungal LysM motifs are predominantly found in subgroup C chitinases and in LysM effector proteins, which are secreted proteins with LysM motifs but no catalytic domains. In enzymes, LysM motifs mediate the attachment to insoluble carbon sources. In plants, receptors containing LysM motifs are responsible for the perception of chitin-oligosaccharides and are involved in beneficial symbiotic interactions between plants and bacteria or fungi, as well as plant defence responses. In plant pathogenic fungi, LysM effector proteins have already been shown to have important functions in the dampening of host defence responses as well as protective functions of fungal hyphae against chitinases. However, the large number and diversity of proteins with LysM motifs that are being unravelled in fungal genome sequencing projects suggest that the functional repertoire of LysM effector proteins in fungi is only partially discovered so far.

  2. An Affinity Propagation-Based DNA Motif Discovery Algorithm

    Directory of Open Access Journals (Sweden)

    Chunxiao Sun

    2015-01-01

    Full Text Available The planted (l,d motif search (PMS is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

  3. An Affinity Propagation-Based DNA Motif Discovery Algorithm.

    Science.gov (United States)

    Sun, Chunxiao; Huo, Hongwei; Yu, Qiang; Guo, Haitao; Sun, Zhigang

    2015-01-01

    The planted (l, d) motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

  4. The Motif Tracking Algorithm

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The search for patterns or motifs in data represents a problem area of key interest to finance and economic researchers. In this paper, we introduce the motif tracking algorithm (MTA), a novel immune inspired (IS) pattern identification tool that is able to identify unknown motifs of a non specified length which repeat within time series data. The power of the algorithm comes from the fact that it uses a small number of parameters with minimal assumptions regarding the data being examined or the underlying motifs. Our interest lies in applying the algorithm to financial time series data to identify unknown patterns that exist. The algorithm is tested using three separate data sets. Particular suitability to financial data is shown by applying it to oil price data. In all cases, the algorithm identifies the presence of a motif population in a fast and efficient manner due to the utilization of an intuitive symbolic representation.The resulting population of motifs is shown to have considerable potential value for other applications such as forecasting and algorithm seeding.

  5. The Motif Tracking Algorithm

    CERN Document Server

    Wilson, William; Aickelin, Uwe; 10.1007/s11633.008.0032.0

    2010-01-01

    The search for patterns or motifs in data represents a problem area of key interest to finance and economic researchers. In this paper we introduce the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs of a non specified length which repeat within time series data. The power of the algorithm comes from the fact that it uses a small number of parameters with minimal assumptions regarding the data being examined or the underlying motifs. Our interest lies in applying the algorithm to financial time series data to identify unknown patterns that exist. The algorithm is tested using three separate data sets. Particular suitability to financial data is shown by applying it to oil price data. In all cases the algorithm identifies the presence of a motif population in a fast and efficient manner due to the utilisation of an intuitive symbolic representation. The resulting population of motifs is shown to have considerable potential value for other ap...

  6. Designing synthetic RNAs to determine the relevance of structural motifs in picornavirus IRES elements

    Science.gov (United States)

    Fernandez-Chamorro, Javier; Lozano, Gloria; Garcia-Martin, Juan Antonio; Ramajo, Jorge; Dotu, Ivan; Clote, Peter; Martinez-Salas, Encarnacion

    2016-04-01

    The function of Internal Ribosome Entry Site (IRES) elements is intimately linked to their RNA structure. Viral IRES elements are organized in modular domains consisting of one or more stem-loops that harbor conserved RNA motifs critical for internal initiation of translation. A conserved motif is the pyrimidine-tract located upstream of the functional initiation codon in type I and II picornavirus IRES. By computationally designing synthetic RNAs to fold into a structure that sequesters the polypyrimidine tract in a hairpin, we establish a correlation between predicted inaccessibility of the pyrimidine tract and IRES activity, as determined in both in vitro and in vivo systems. Our data supports the hypothesis that structural sequestration of the pyrimidine-tract within a stable hairpin inactivates IRES activity, since the stronger the stability of the hairpin the higher the inhibition of protein synthesis. Destabilization of the stem-loop immediately upstream of the pyrimidine-tract also decreases IRES activity. Our work introduces a hybrid computational/experimental method to determine the importance of structural motifs for biological function. Specifically, we show the feasibility of using the software RNAiFold to design synthetic RNAs with particular sequence and structural motifs that permit subsequent experimental determination of the importance of such motifs for biological function.

  7. A functional EXXEK motif is essential for proton coupling and active glucosinolate transport by NPF2.11

    DEFF Research Database (Denmark)

    Jørgensen, Morten Egevang; Olsen, Carl Erik; Geiger, Dietmar

    2015-01-01

    The proton-dependent oligopeptide transporter (POT/PTR) family shares a highly conserved E1X1X2E2RFXYY (E1X1X2E2R) motif across all kingdoms of life. This motif is suggested to have a role in proton coupling and active transport in bacterial homologs. For the plant POT/PTR family, also known......K motif variant in a plant NPF transporter. Using liquid chromatography-mass spectrometry (LC-MS)-based uptake assays and two-electrode voltage clamp (TEVC) electrophysiology, we demonstrate an essential role for the E1X1X2E2K motif for accumulation of substrate by NPF2.11. Our data suggest...

  8. Bases of motifs for generating repeated patterns with wild cards.

    Science.gov (United States)

    Pisanti, Nadia; Crochemore, Maxime; Grossi, Roberto; Sagot, Marie-France

    2005-01-01

    Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive evidence in favor of either, and recent work has attempted to integrate the two types into a single model. In this paper, we address the formal issue in relation to motifs as patterns. This is essential to get at a better understanding of motifs in general. In particular, we consider a promising idea that was recently proposed, which attempted to avoid the combinatorial explosion in the number of motifs by means of a generator set for the motifs. Instead of exhibiting a complete list of motifs satisfying some input constraints, what is produced is a basis of such motifs from which all the other ones can be generated. We study the computational cost of determining such a basis of repeated motifs with wild cards in a sequence. We give new upper and lower bounds on such a cost, introducing a notion of basis that is provably contained in (and, thus, smaller) than previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all bases defined so far grows exponentially with the quorum, that is, with the minimal number of times a motif must appear in a sequence, something unnoticed in previous work. We show that there is no hope to efficiently compute such bases unless the quorum is fixed.

  9. EAR motif-mediated transcriptional repression in plants: an underlying mechanism for epigenetic regulation of gene expression.

    Science.gov (United States)

    Kagale, Sateesh; Rozwadowski, Kevin

    2011-02-01

    Ethylene-responsive element binding factor-associated Amphiphilic Repression (EAR) motif-mediated transcriptional repression is emerging as one of the principal mechanisms of plant gene regulation. The EAR motif, defined by the consensus sequence patterns of either LxLxL or DLNxxP, is the most predominant form of transcriptional repression motif so far identified in plants. Additionally, this active repression motif is highly conserved in transcriptional regulators known to function as negative regulators in a broad range of developmental and physiological processes across evolutionarily diverse plant species. Recent discoveries of co-repressors interacting with EAR motifs, such as TOPLESS (TPL) and AtSAP18, have begun to unravel the mechanisms of EAR motif-mediated repression. The demonstration of genetic interaction between mutants of TPL and AtHDA19, co-complex formation between TPL-related 1 (TPR1) and AtHDA19, as well as direct physical interaction between AtSAP18 and AtHDA19 support a model where EAR repressors, via recruitment of chromatin remodeling factors, facilitate epigenetic regulation of gene expression. Here, we discuss the biological significance of EAR-mediated gene regulation in the broader context of plant biology and present literature evidence in support of a model for EAR motif-mediated repression via the recruitment and action of chromatin modifiers. Additionally, we discuss the possible influences of phosphorylation and ubiquitination on the function and turnover of EAR repressors.

  10. IL-4 function can be transferred to the IL-2 receptor by tyrosine containing sequences found in the IL-4 receptor alpha chain.

    Science.gov (United States)

    Wang, H Y; Paul, W E; Keegan, A D

    1996-02-01

    IL-4 binds to a cell surface receptor complex that consists of the IL-4 binding protein (IL-4R alpha) and the gamma chain of the IL-2 receptor complex (gamma c). The receptors for IL-4 and IL-2 have several features in common; both use the gamma c as a receptor component, and both activate the Janus kinases JAK-1 and JAK-3. In spite of these similarities, IL-4 evokes specific responses, including the tyrosine phosphorylation of 4PS/IRS-2 and the induction of CD23. To determine whether sequences within the cytoplasmic domain of the IL-4R alpha specify these IL-4-specific responses, we transplanted the insulin IL-4 receptor motif (I4R motif) of the huIL-4R alpha to the cytoplasmic domain of a truncated IL-2R beta. In addition, we transplanted a region that contains peptide sequences shown to block Stat6 binding to DNA. We analyzed the ability of cells expressing these IL-2R-IL-4R chimeric constructs to respond to IL-2. We found that IL-4 function could be transplanted to the IL-2 receptor by these regions and that proliferative and differentiative functions can be induced by different receptor sequences.

  11. Detection of a functional insertion sequence responsible for deletion of the thermostable direct hemolysin gene (tdh) in Vibrio parahaemolyticus.

    Science.gov (United States)

    Kamruzzaman, Muhammad; Bhoopong, Phuangthip; Vuddhakul, Varaporn; Nishibuchi, Mitsuaki

    2008-09-15

    The thermostable direct hemolysin coded by the tdh gene is a marker of virulent strains of Vibrio parahaemolyticus. The tdh genes are flanked by insertion sequences collectively named as ISVs or their remnants; but the ISVs so far examined have accumulated mutations in the transposase genes and underwent structural arrangements and their transposition activity could not be expected; the tdh gene was thus considered to have been acquired by V. parahaemolyticus through horizontal transfer in the past during evolution. We recently isolated from the same patient tdh+ strains and a tdh(-) strain (PCR examination) that were otherwise indistinguishable. The purpose of this study was to examine the hypothesis that the tdh(-) strain was derived from the tdh+ strain by a deletion of the tdh gene mediated by a functional ISV. Southern blot hybridization showed tdh+ sequences in the tdh(-) strain (PSU-1466). Nucleotide sequence analysis of the tdh and its flanking sequences revealed the tdh gene was split into two parts and they were located 3182-bp apart in PSU-1466. The two tdh sequences were flanked by one of the ISVs, named as ISVpa3, in PSU-1466. This genetic structure could be explained by an ISVpa3-mediated partial tdh deletion from a tdh+ strain followed by transposition of the duplicated ISVpa3 and the deleted tdh sequence into a neighboring location. The ISVpa3 of PSU-1466 coded for a full-length transposase and a DDE motif. We were able to demonstrate transposition activity of the ISVpa3 cloned from PSU-1466 using the replicon fusion assay with the conjugal transfer of a cointegrate from Escherichia coli to V. parahaemolyticus. Our data support ISVpa3-mediated partial tdh deletion resulted in the emergence of the tdh(-) strain.

  12. Encoded expansion: an efficient algorithm to discover identical string motifs.

    Directory of Open Access Journals (Sweden)

    Aqil M Azmi

    Full Text Available A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009 Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952-7963 devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes [Formula: see text] in theoretical time complexity of [Formula: see text] and a space complexity of [Formula: see text] where [Formula: see text] is the length of the input sequence and [Formula: see text] is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes [Formula: see text] that occur at least [Formula: see text] times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than [Formula: see text] times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of [Formula: see text] and a space complexity of [Formula: see text] Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms.

  13. Rapid fixation of a distinctive sequence motif in the 3' noncoding region of the clade of West Nile virus invading North America.

    Science.gov (United States)

    Hughes, Austin L; Piontkivska, Helen; Foppa, Ivo

    2007-09-15

    Phylogenetic analysis of complete genomes of West Nile virus (WNV) by a variety of methods supported the hypothesis that North American isolates of WNV constitute a monophyletic group, together with an isolate from Israel and one from Hungary. We used ancestral sequence reconstruction in order to obtain evidence for evolutionary changes that might be correlated with increased virulence in this clade (designated the N.A. clade). There was one amino acid change (I-->T at residue 356 of the NS3 protein) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed. There were four changes in the upstream portion of the 3' noncoding region (the AT-enriched region) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed, changes predicted to alter RNA secondary structure. The AT-enriched region showed a higher rate of substitution in the branch ancestral to the N.A. clade, relative to polymorphism, than did the remainder of the noncoding regions, synonymous sites in coding regions, or nonsynonymous sites in coding regions. The high rate of occurrence of fixed nucleotide substitutions in this region suggests that positive Darwinian selection may have acted on this portion of the 3'NCR and that these fixed changes, possibly in concert with the amino acid change in NS3, may underlie phenotypic effects associated with increased virulence in North American WNV.

  14. A combinatorial code for splicing silencing: UAGG and GGGG motifs.

    Directory of Open Access Journals (Sweden)

    Kyoungha Han

    2005-05-01

    Full Text Available Alternative pre-mRNA splicing is widely used to regulate gene expression by tuning the levels of tissue-specific mRNA isoforms. Few regulatory mechanisms are understood at the level of combinatorial control despite numerous sequences, distinct from splice sites, that have been shown to play roles in splicing enhancement or silencing. Here we use molecular approaches to identify a ternary combination of exonic UAGG and 5'-splice-site-proximal GGGG motifs that functions cooperatively to silence the brain-region-specific CI cassette exon (exon 19 of the glutamate NMDA R1 receptor (GRIN1 transcript. Disruption of three components of the motif pattern converted the CI cassette into a constitutive exon, while predominant skipping was conferred when the same components were introduced, de novo, into a heterologous constitutive exon. Predominant exon silencing was directed by the motif pattern in the presence of six competing exonic splicing enhancers, and this effect was retained after systematically repositioning the two exonic UAGGs within the CI cassette. In this system, hnRNP A1 was shown to mediate silencing while hnRNP H antagonized silencing. Genome-wide computational analysis combined with RT-PCR testing showed that a class of skipped human and mouse exons can be identified by searches that preserve the sequence and spatial configuration of the UAGG and GGGG motifs. This analysis suggests that the multi-component silencing code may play an important role in the tissue-specific regulation of the CI cassette exon, and that it may serve more generally as a molecular language to allow for intricate adjustments and the coordination of splicing patterns from different genes.

  15. A combinatorial code for splicing silencing: UAGG and GGGG motifs.

    Science.gov (United States)

    Han, Kyoungha; Yeo, Gene; An, Ping; Burge, Christopher B; Grabowski, Paula J

    2005-05-01

    Alternative pre-mRNA splicing is widely used to regulate gene expression by tuning the levels of tissue-specific mRNA isoforms. Few regulatory mechanisms are understood at the level of combinatorial control despite numerous sequences, distinct from splice sites, that have been shown to play roles in splicing enhancement or silencing. Here we use molecular approaches to identify a ternary combination of exonic UAGG and 5'-splice-site-proximal GGGG motifs that functions cooperatively to silence the brain-region-specific CI cassette exon (exon 19) of the glutamate NMDA R1 receptor (GRIN1) transcript. Disruption of three components of the motif pattern converted the CI cassette into a constitutive exon, while predominant skipping was conferred when the same components were introduced, de novo, into a heterologous constitutive exon. Predominant exon silencing was directed by the motif pattern in the presence of six competing exonic splicing enhancers, and this effect was retained after systematically repositioning the two exonic UAGGs within the CI cassette. In this system, hnRNP A1 was shown to mediate silencing while hnRNP H antagonized silencing. Genome-wide computational analysis combined with RT-PCR testing showed that a class of skipped human and mouse exons can be identified by searches that preserve the sequence and spatial configuration of the UAGG and GGGG motifs. This analysis suggests that the multi-component silencing code may play an important role in the tissue-specific regulation of the CI cassette exon, and that it may serve more generally as a molecular language to allow for intricate adjustments and the coordination of splicing patterns from different genes.

  16. Epitope-based vaccines with the Anaplasma marginale MSP1a functional motif induce a balanced humoral and cellular immune response in mice.

    Directory of Open Access Journals (Sweden)

    Paula S Santos

    Full Text Available Bovine anaplasmosis is a hemoparasitic disease that causes considerable economic loss to the dairy and beef industries. Cattle immunized with the Anaplasma marginale MSP1 outer membrane protein complex presents a protective humoral immune response; however, its efficacy is variable. Immunodominant epitopes seem to be a key-limiting factor for the adaptive immunity. We have successfully demonstrated that critical motifs of the MSP1a functional epitope are essential for antibody recognition of infected animal sera, but its protective immunity is yet to be tested. We have evaluated two synthetic vaccine formulations against A. marginale, using epitope-based approach in mice. Mice infection with bovine anaplasmosis was demonstrated by qPCR analysis of erythrocytes after 15-day exposure. A proof-of-concept was obtained in this murine model, in which peptides conjugated to bovine serum albumin were used for immunization in three 15-day intervals by intraperitoneal injections before challenging with live bacteria. Blood samples were analyzed for the presence of specific IgG2a and IgG1 antibodies, as well as for the rickettsemia analysis. A panel containing the cytokines' transcriptional profile for innate and adaptive immune responses was carried out through qPCR. Immunized BALB/c mice challenged with A. marginale presented stable body weight, reduced number of infected erythrocytes, and no mortality; and among control groups mortality rates ranged from 15% to 29%. Additionally, vaccines have significantly induced higher IgG2a than IgG1 response, followed by increased expression of pro-inflammatory cytokines. This is a successful demonstration of epitope-based vaccines, and protection against anaplasmosis may be associated with elicitation of effector functions of humoral and cellular immune responses in murine model.

  17. Universal sequence replication, reversible polymerization and early functional biopolymers: a model for the initiation of prebiotic sequence evolution.

    Directory of Open Access Journals (Sweden)

    Sara Imari Walker

    Full Text Available Many models for the origin of life have focused on understanding how evolution can drive the refinement of a preexisting enzyme, such as the evolution of efficient replicase activity. Here we present a model for what was, arguably, an even earlier stage of chemical evolution, when polymer sequence diversity was generated and sustained before, and during, the onset of functional selection. The model includes regular environmental cycles (e.g. hydration-dehydration cycles that drive polymers between times of replication and functional activity, which coincide with times of different monomer and polymer diffusivity. Template-directed replication of informational polymers, which takes place during the dehydration stage of each cycle, is considered to be sequence-independent. New sequences are generated by spontaneous polymer formation, and all sequences compete for a finite monomer resource that is recycled via reversible polymerization. Kinetic Monte Carlo simulations demonstrate that this proposed prebiotic scenario provides a robust mechanism for the exploration of sequence space. Introduction of a polymer sequence with monomer synthetase activity illustrates that functional sequences can become established in a preexisting pool of otherwise non-functional sequences. Functional selection does not dominate system dynamics and sequence diversity remains high, permitting the emergence and spread of more than one functional sequence. It is also observed that polymers spontaneously form clusters in simulations where polymers diffuse more slowly than monomers, a feature that is reminiscent of a previous proposal that the earliest stages of life could have been defined by the collective evolution of a system-wide cooperation of polymer aggregates. Overall, the results presented demonstrate the merits of considering plausible prebiotic polymer chemistries and environments that would have allowed for the rapid turnover of monomer resources and for

  18. Universal Sequence Replication, Reversible Polymerization and Early Functional Biopolymers: A Model for the Initiation of Prebiotic Sequence Evolution

    Science.gov (United States)

    Walker, Sara Imari; Grover, Martha A.; Hud, Nicholas V.

    2012-01-01

    Many models for the origin of life have focused on understanding how evolution can drive the refinement of a preexisting enzyme, such as the evolution of efficient replicase activity. Here we present a model for what was, arguably, an even earlier stage of chemical evolution, when polymer sequence diversity was generated and sustained before, and during, the onset of functional selection. The model includes regular environmental cycles (e.g. hydration-dehydration cycles) that drive polymers between times of replication and functional activity, which coincide with times of different monomer and polymer diffusivity. Template-directed replication of informational polymers, which takes place during the dehydration stage of each cycle, is considered to be sequence-independent. New sequences are generated by spontaneous polymer formation, and all sequences compete for a finite monomer resource that is recycled via reversible polymerization. Kinetic Monte Carlo simulations demonstrate that this proposed prebiotic scenario provides a robust mechanism for the exploration of sequence space. Introduction of a polymer sequence with monomer synthetase activity illustrates that functional sequences can become established in a preexisting pool of otherwise non-functional sequences. Functional selection does not dominate system dynamics and sequence diversity remains high, permitting the emergence and spread of more than one functional sequence. It is also observed that polymers spontaneously form clusters in simulations where polymers diffuse more slowly than monomers, a feature that is reminiscent of a previous proposal that the earliest stages of life could have been defined by the collective evolution of a system-wide cooperation of polymer aggregates. Overall, the results presented demonstrate the merits of considering plausible prebiotic polymer chemistries and environments that would have allowed for the rapid turnover of monomer resources and for regularly varying monomer

  19. Rapid Fixation of a Distinctive Sequence Motif in the 3′Noncoding Region of the Clade of West Nile Virus Invading North America

    Science.gov (United States)

    Hughes, Austin L.; Piontkivska, Helen; Foppa, Ivo

    2007-01-01

    Phylogenetic analysis of complete genomes of West Nile virus (WNV) by a variety of methods supported the hypothesis that North American isolates of WNV constitute a monophyletic group, together with an isolate from Israel and one from Hungary. We used ancestral sequence reconstruction in order to obtain evidence for evolutionary changes that might be correlated with increased virulence in this clade (designated the N.A. clade). There was one amino acid change (I→T at residue 356 of the NS3 protein) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed. There were four changes in the upstream portion of the 3′ noncoding region (the AT-enriched region) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed, changes predicted to alter RNA secondary structure. The AT-enriched region showed a higher rate of substitution in the branch ancestral to the N.A. clade, relative to polymorphism, than did the remainder of the non-coding regions, synonymous sites in coding regions, or nonsynonymous sites in coding regions. The high rate of occurrence of fixed nucleotide substitutions in this region suggests that positive Darwinian selection may have acted on this portion of the 3′NCR and that these fixed changes, possibly in concert with the amino acid change in NS3, may underlie phenotypic effects associated with increased virulence in North American WNV. PMID:17587514

  20. Probing structural changes of self assembled i-motif DNA

    KAUST Repository

    Lee, Iljoon

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  1. Visibility graph motifs

    CERN Document Server

    Iacovacci, Jacopo

    2015-01-01

    Visibility algorithms transform time series into graphs and encode dynamical information in their topology, paving the way for graph-theoretical time series analysis as well as building a bridge between nonlinear dynamics and network science. In this work we introduce and study the concept of visibility graph motifs, smaller substructures that appear with characteristic frequencies. We develop a theory to compute in an exact way the motif profiles associated to general classes of deterministic and stochastic dynamics. We find that this simple property is indeed a highly informative and computationally efficient feature capable to distinguish among different dynamics and robust against noise contamination. We finally confirm that it can be used in practice to perform unsupervised learning, by extracting motif profiles from experimental heart-rate series and being able, accordingly, to disentangle meditative from other relaxation states. Applications of this general theory include the automatic classification a...

  2. 1-t-motifs

    CERN Document Server

    Taelman, Lenny

    2009-01-01

    We show that the module of rational points on an abelian t-module E is canonically isomorphic with the module Ext^1(M_E, K[t]) of extensions of the trivial t-motif K[t] by the t-motif M_E associated with E. This generalizes prior results of Anderson and Thakur and of Papanikolas and Ramachandran. In case E is uniformizable then we show that this extension module is canonically isomorphic with the corresponding extension module of Pink-Hodge structures. This situation is formally very similar to Deligne's theory of 1-motifs and we have tried to build up the theory in a way that makes this analogy as clear as possible.

  3. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  4. Chaotic motifs in gene regulatory networks.

    Science.gov (United States)

    Zhang, Zhaoyang; Ye, Weiming; Qian, Yu; Zheng, Zhigang; Huang, Xuhui; Hu, Gang

    2012-01-01

    Chaos should occur often in gene regulatory networks (GRNs) which have been widely described by nonlinear coupled ordinary differential equations, if their dimensions are no less than 3. It is therefore puzzling that chaos has never been reported in GRNs in nature and is also extremely rare in models of GRNs. On the other hand, the topic of motifs has attracted great attention in studying biological networks, and network motifs are suggested to be elementary building blocks that carry out some key functions in the network. In this paper, chaotic motifs (subnetworks with chaos) in GRNs are systematically investigated. The conclusion is that: (i) chaos can only appear through competitions between different oscillatory modes with rivaling intensities. Conditions required for chaotic GRNs are found to be very strict, which make chaotic GRNs extremely rare. (ii) Chaotic motifs are explored as the simplest few-node structures capable of producing chaos, and serve as the intrinsic source of chaos of random few-node GRNs. Several optimal motifs causing chaos with atypically high probability are figured out. (iii) Moreover, we discovered that a number of special oscillators can never produce chaos. These structures bring some advantages on rhythmic functions and may help us understand the robustness of diverse biological rhythms. (iv) The methods of dominant phase-advanced driving (DPAD) and DPAD time fraction are proposed to quantitatively identify chaotic motifs and to explain the origin of chaotic behaviors in GRNs.

  5. An almost Sure Central Limit Theorem for the Weight Function Sequences of NA Random Variables

    Indian Academy of Sciences (India)

    Qunying Wu

    2011-08-01

    Consider the weight function sequences of NA random variables. This paper proves that the almost sure central limit theorem holds for the weight function sequences of NA random variables. Our results generalize and improve those on the almost sure central limit theorem previously obtained from the i.i.d. case to NA sequences.

  6. RNA structural motif recognition based on least-squares distance.

    Science.gov (United States)

    Shen, Ying; Wong, Hau-San; Zhang, Shaohong; Zhang, Lin

    2013-09-01

    RNA structural motifs are recurrent structural elements occurring in RNA molecules. RNA structural motif recognition aims to find RNA substructures that are similar to a query motif, and it is important for RNA structure analysis and RNA function prediction. In view of this, we propose a new method known as RNA Structural Motif Recognition based on Least-Squares distance (LS-RSMR) to effectively recognize RNA structural motifs. A test set consisting of five types of RNA structural motifs occurring in Escherichia coli ribosomal RNA is compiled by us. Experiments are conducted for recognizing these five types of motifs. The experimental results fully reveal the superiority of the proposed LS-RSMR compared with four other state-of-the-art methods.

  7. AISMOTIF-An Artificial Immune System for DNA Motif Discovery

    CERN Document Server

    Seeja, K R

    2011-01-01

    Discovery of transcription factor binding sites is a much explored and still exploring area of research in functional genomics. Many computational tools have been developed for finding motifs and each of them has their own advantages as well as disadvantages. Most of these algorithms need prior knowledge about the data to construct background models. However there is not a single technique that can be considered as best for finding regulatory motifs. This paper proposes an artificial immune system based algorithm for finding the transcription factor binding sites or motifs and two new weighted scores for motif evaluation. The algorithm is enumerative, but sufficient pruning of the pattern search space has been incorporated using immune system concepts. The performance of AISMOTIF has been evaluated by comparing it with eight state of art composite motif discovery algorithms and found that AISMOTIF predicts known motifs as well as new motifs from the benchmark dataset without any prior knowledge about the data...

  8. Functional noncoding sequences derived from SINEs in the mammalian genome.

    Science.gov (United States)

    Nishihara, Hidenori; Smit, Arian F A; Okada, Norihiro

    2006-07-01

    Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the approximately 1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality.

  9. [Personal motif in art].

    Science.gov (United States)

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  10. Why there is more to protein evolution than protein function: splicing, nucleosomes and dual-coding sequence.

    Science.gov (United States)

    Warnecke, Tobias; Weber, Claudia C; Hurst, Laurence D

    2009-08-01

    There is considerable variation in the rate at which different proteins evolve. Why is this? Classically, it has been considered that the density of functionally important sites must predict rates of protein evolution. Likewise, amino acid choice is usually assumed to reflect optimal protein function. In the present article, we briefly review evidence suggesting that this protein function-centred view is too simplistic. In particular, we concentrate on how selection acting during the protein's production history can also affect protein evolutionary rates and amino acid choice. Exploring the role of selection at the DNA and RNA level, we specifically address how the need (i) to specify exonic splice enhancer motifs in pre-mRNA, and (ii) to ensure nucleosome positioning on DNA have an impact on amino acid choice and rates of evolution. For both, we review evidence that sequence affected by more than one coding demand is particularly constrained. Strikingly, in mammals, splicing-related constraints are quantitatively as important as expression parameters in predicting rates of protein evolution. These results indicate that there is substantially more to protein evolution than protein functional constraints.

  11. Pseudouridine synthases: four families of enzymes containing a putative uridine-binding motif also conserved in dUTPases and dCTP deaminases.

    Science.gov (United States)

    Koonin, E V

    1996-06-15

    Using a combination of several methods for protein sequence comparison and motif analysis, it is shown that the four recently described pseudouridine syntheses with different specificities belong to four distinct families. Three of these families share two conserved motifs that are likely to be directly involved in catalysis. One of these motifs is detected also in two other families of enzymes that specifically bind uridine, namely deoxycitidine triphosphate deaminases and deoxyuridine triphosphatases. It is proposed that this motif is an essential part of the uridine-binding site. Two of the pseudouridine syntheses, one of which modifies the anticodon arm of tRNAs and the other is predicted to modify a portion of the large ribosomal subunit RNA belonging to the peptidyltransferase center, are encoded in all extensively sequenced genomes, including the 'minimal' genome of Mycoplasma genitalium. These particular RNA modifications and the respective enzymes are likely to be essential for the functioning of any cell.

  12. A novel pro-Arg motif recognized by WW domains.

    Science.gov (United States)

    Bedford, M T; Sarbassova, D; Xu, J; Leder, P; Yaffe, M B

    2000-04-07

    WW domains mediate protein-protein interactions through binding to short proline-rich sequences. Two distinct sequence motifs, PPXY and PPLP, are recognized by different classes of WW domains, and another class binds to phospho-Ser-Pro sequences. We now describe a novel Pro-Arg sequence motif recognized by a different class of WW domains using data from oriented peptide library screening, expression cloning, and in vitro binding experiments. The prototype member of this group is the WW domain of formin-binding protein 30 (FBP30), a p53-regulated molecule whose WW domains bind to Pro-Arg-rich cellular proteins. This new Pro-Arg sequence motif re-classifies the organization of WW domains based on ligand specificity, and the Pro-Arg class now includes the WW domains of FBP21 and FE65. A structural model is presented which rationalizes the distinct motifs selected by the WW domains of YAP, Pin1, and FBP30. The Pro-Arg motif identified for WW domains often overlaps with SH3 domain motifs within protein sequences, suggesting that the same extended proline-rich sequence could form discrete SH3 or WW domain complexes to transduce distinct cellular signals.

  13. ET-Motif: Solving the Exact (l, d)-Planted Motif Problem Using Error Tree Structure.

    Science.gov (United States)

    Al-Okaily, Anas; Huang, Chun-Hsi

    2016-07-01

    Motif finding is an important and a challenging problem in many biological applications such as discovering promoters, enhancers, locus control regions, transcription factors, and more. The (l, d)-planted motif search, PMS, is one of several variations of the problem. In this problem, there are n given sequences over alphabets of size [Formula: see text], each of length m, and two given integers l and d. The problem is to find a motif m of length l, where in each sequence there is at least an l-mer at a Hamming distance of [Formula: see text] of m. In this article, we propose ET-Motif, an algorithm that can solve the PMS problem in [Formula: see text] time and [Formula: see text] space. The time bound can be further reduced by a factor of m with [Formula: see text] space. In case the suffix tree that is built for the input sequences is balanced, the problem can be solved in [Formula: see text] time and [Formula: see text] space. Similarly, the time bound can be reduced by a factor of m using [Formula: see text] space. Moreover, the variations of the problem, namely the edit distance PMS and edited PMS (Quorum), can be solved using ET-Motif with simple modifications but upper bands of space and time. For edit distance PMS, the time and space bounds will be increased by [Formula: see text], while for edited PMS the increase will be of [Formula: see text] in the time bound.

  14. GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.

    Science.gov (United States)

    Suzuki, Shuji; Ishida, Takashi; Ohue, Masahito; Kakuta, Masanori; Akiyama, Yutaka

    2017-01-01

    Metagenomic analysis based on whole genome shotgun sequencing data requires fast protein sequence homology searches for predicting the function of proteins coded on metagenome short reads. However, huge amounts of sequence data cause even general homology search analyses using BLASTX to become difficult in terms of computational cost. GHOSTX is a sequence homology search tool specifically developed for functional annotation of metagenome sequences. The tool is more than 160 times faster than BLASTX and has sufficient search sensitivity for metagenomic analysis. Using this tool, user can perform functional annotation of metagenomic data within a short time and infer metabolic pathways within an environment.

  15. The mechanical design of spider silks: from fibroin sequence to mechanical function.

    Science.gov (United States)

    Gosline, J M; Guerette, P A; Ortlepp, C S; Savage, K N

    1999-12-01

    Spiders produce a variety of silks, and the cloning of genes for silk fibroins reveals a clear link between protein sequence and structure-property relationships. The fibroins produced in the spider's major ampullate (MA) gland, which forms the dragline and web frame, contain multiple repeats of motifs that include an 8-10 residue long poly-alanine block and a 24-35 residue long glycine-rich block. When fibroins are spun into fibres, the poly-alanine blocks form (&bgr;)-sheet crystals that crosslink the fibroins into a polymer network with great stiffness, strength and toughness. As illustrated by a comparison of MA silks from Araneus diadematus and Nephila clavipes, variation in fibroin sequence and properties between spider species provides the opportunity to investigate the design of these remarkable biomaterials.

  16. Assessing the Exceptionality of Coloured Motifs in Networks

    Directory of Open Access Journals (Sweden)

    Lacroix Vincent

    2009-01-01

    Full Text Available Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive -values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution better approximates the distribution of the motif count compared to Gaussian or Poisson distributions. The Pólya-Aeppli distribution, and more generally the compound Poisson distributions, are indeed well designed to model counts of clumping events. Altogether, these results enable to derive a -value for a coloured motif, without spending time on simulations.

  17. Spodoptera frugiperda FKBP-46 is a consensus p53 motif binding protein.

    Science.gov (United States)

    Mohareer, Krishnaveni; Sahdev, Sudhir; Hasnain, Seyed E

    2013-04-01

    p53 protein, the central molecule of the apoptosis pathway, is mutated in 50% of the human cancers. Of late, p53 homologues have been identified from different invertebrates including Drosophila melanogaster, Caenorhabditis elegans, Squid, and Clams. We report the identification of a p53-like protein in Spodoptera frugiperda (Sf9) insect cells, which is activated during oxidative stress, caused by exposure to UV-B or H(2) O(2) , and binds to p53 consensus DNA binding motifs as well as other p53 cognate motifs. Sf9 p53 motif-binding protein is similar to murine and Drosophila p53 in terms of molecular size, which is around 50-60 kDa, as evident from UV cross-linking, and displays DNA binding characteristics similar to both insect and vertebrate p53 as seen from electrophoretic mobility shift assays. The N-terminal sequencing of the purified Sf9 p53 motif-binding protein reveals extensive homology to the pro-apoptotic FK-506 binding protein (FKBP-46), earlier identified in Sf9 cells as a factor which interacts with murine casein kinase. FKBP, an evolutionarily conserved protein of mammalian origin functions as a pro-apoptotic factor. Identification of FKBP-46 as a novel p53 motif-binding protein in insect cells adds a new facet to our understanding of the mechanisms of apoptosis under oxidative stress in the absence of a typical p53 homologue.

  18. Distinct cagA EPIYA motifs are associated with ethnic diversity in Malaysia and Singapore.

    Science.gov (United States)

    Schmidt, Heather-Marie A; Goh, Khean-Lee; Fock, Kwong Ming; Hilmi, Ida; Dhamodaran, Subbiah; Forman, David; Mitchell, Hazel

    2009-08-01

    In vitro studies have shown that the biologic activity of CagA is influenced by the number and class of EPIYA motifs present in its variable region as these motifs correspond to the CagA phosphorylation sites. It has been hypothesized that strains possessing specific combinations of these motifs may be responsible for gastric cancer development. This study investigated the prevalence of cagA and the EPIYA motifs with regard to number, class, and patterns in strains from the three major ethnic groups within the Malaysian and Singaporean populations in relation to disease development. Helicobacter pylori isolates from 49 Chinese, 43 Indian, and 14 Malay patients with functional dyspepsia (FD) and 21 gastric cancer (GC) cases were analyzed using polymerase chain reaction for the presence of cagA and the number, type, and pattern of EPIYA motifs. Additionally, the EPIYA motifs of 47 isolates were sequenced. All 126 isolates possessed cagA, with the majority encoding EPIYA-A (97.6%) and all encoding EPIYA-B. However, while the cagA of 93.0% of Indian FD isolates encoded EPIYA-C as the third motif, 91.8% of Chinese FD isolates and 81.7% of Chinese GC isolates encoded EPIYA-D (p Malaysia and Singapore, these genotypes appear unassociated with the development of GC in the ethnic Chinese population. The phenomenon of distinct strains circulating within different ethnic groups, in combination with host and certain environmental factors, may help to explain the rates of GC development in Malaysia.

  19. Bioactive motifs of agouti signal protein.

    Science.gov (United States)

    Virador, V M; Santis, C; Furumura, M; Kalbacher, H; Hearing, V J

    2000-08-25

    The switch between the synthesis of eu- and pheomelanins is modulated by the interaction of two paracrine signaling molecules, alpha-melanocyte stimulating hormone (MSH) and agouti signal protein (ASP), which interact with melanocytes via the MSH receptor (MC1R). Comparison of the primary sequence of ASP with the known MSH pharmacophore provides no suggestion about the putative bioactive domain(s) of ASP. To identify such bioactive motif(s), we synthesized 15-mer peptides that spanned the primary sequence of ASP and determined their effects on the melanogenic activities of murine melanocytes. Northern and Western blotting were used, together with chemical analysis of melanins and enzymatic assays, to identify three distinct bioactive regions of ASP that down-regulate eumelanogenesis. The decrease in eumelanin production was mediated by down-regulation of mRNA levels for tyrosinase and other melanogenic enzymes, as occurs in vivo, and these effects were comparable to those elicited by intact recombinant ASP. Shorter peptides in those motifs were synthesized and their effects on melanogenesis were further investigated. The amino acid arginine, which is present in the MSH peptide pharmacophore (HFRW), is also in the most active domain of ASP (KVARP). Our data suggest that lysines and an arginine (in motifs such as KxxxxKxxR or KxxRxxxxK) are important for the bioactivity of ASP. Identification of the specific ASP epitope that interacts with the MC1R has potential pharmacological applications in treating dysfunctions of skin pigmentation.

  20. Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks.

    Directory of Open Access Journals (Sweden)

    Mark D McDonnell

    Full Text Available Complex networks are frequently characterized by metrics for which particular subgraphs are counted. One statistic from this category, which we refer to as motif-role fingerprints, differs from global subgraph counts in that the number of subgraphs in which each node participates is counted. As with global subgraph counts, it can be important to distinguish between motif-role fingerprints that are 'structural' (induced subgraphs and 'functional' (partial subgraphs. Here we show mathematically that a vector of all functional motif-role fingerprints can readily be obtained from an arbitrary directed adjacency matrix, and then converted to structural motif-role fingerprints by multiplying that vector by a specific invertible conversion matrix. This result demonstrates that a unique structural motif-role fingerprint exists for any given functional motif-role fingerprint. We demonstrate a similar result for the cases of functional and structural motif-fingerprints without node roles, and global subgraph counts that form the basis of standard motif analysis. We also explicitly highlight that motif-role fingerprints are elemental to several popular metrics for quantifying the subgraph structure of directed complex networks, including motif distributions, directed clustering coefficient, and transitivity. The relationships between each of these metrics and motif-role fingerprints also suggest new subtypes of directed clustering coefficients and transitivities. Our results have potential utility in analyzing directed synaptic networks constructed from neuronal connectome data, such as in terms of centrality. Other potential applications include anomaly detection in networks, identification of similar networks and identification of similar nodes within networks. Matlab code for calculating all stated metrics following calculation of functional motif-role fingerprints is provided as S1 Matlab File.

  1. The effect of orthology and coregulation on detecting regulatory motifs.

    Directory of Open Access Journals (Sweden)

    Valerie Storms

    Full Text Available BACKGROUND: Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. METHODOLOGY: We designed datasets (real and synthetic covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. RESULTS AND CONCLUSIONS: Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.

  2. A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching

    Science.gov (United States)

    Romero, José R.; Carballido, Jessica A.; Garbus, Ingrid; Echenique, Viviana C.; Ponzoni, Ignacio

    2016-01-01

    The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka. PMID:27812277

  3. STEME: a robust, accurate motif finder for large data sets.

    Directory of Open Access Journals (Sweden)

    John E Reid

    Full Text Available Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface.

  4. The MARVEL transmembrane motif of occludin mediates oligomerization and targeting to the basolateral surface in epithelia.

    Science.gov (United States)

    Yaffe, Yakey; Shepshelovitch, Jeanne; Nevo-Yassaf, Inbar; Yeheskel, Adva; Shmerling, Hedva; Kwiatek, Joanna M; Gaus, Katharina; Pasmanik-Chor, Metsada; Hirschberg, Koret

    2012-08-01

    Occludin (Ocln), a MARVEL-motif-containing protein, is found in all tight junctions. MARVEL motifs are comprised of four transmembrane helices associated with the localization to or formation of diverse membrane subdomains by interacting with the proximal lipid environment. The functions of the Ocln MARVEL motif are unknown. Bioinformatics sequence- and structure-based analyses demonstrated that the MARVEL domain of Ocln family proteins has distinct evolutionarily conserved sequence features that are consistent with its basolateral membrane localization. Live-cell microscopy, fluorescence resonance energy transfer (FRET) and bimolecular fluorescence complementation (BiFC) were used to analyze the intracellular distribution and self-association of fluorescent-protein-tagged full-length human Ocln or the Ocln MARVEL motif excluding the cytosolic C- and N-termini (amino acids 60-269, FP-MARVEL-Ocln). FP-MARVEL-Ocln efficiently arrived at the plasma membrane (PM) and was sorted to the basolateral PM in filter-grown polarized MDCK cells. A series of conserved aromatic amino acids within the MARVEL domain were found to be associated with Ocln dimerization using BiFC. FP-MARVEL-Ocln inhibited membrane pore growth during Triton-X-100-induced solubilization and was shown to increase the membrane-ordered state using Laurdan, a lipid dye. These data demonstrate that the Ocln MARVEL domain mediates self-association and correct sorting to the basolateral membrane.

  5. Sequencing Computer-Assisted Learning of Transformations of Trigonometric Functions

    Science.gov (United States)

    Ross, John A.; Bruce, Catherine D.; Sibbald, Timothy M.

    2011-01-01

    Studies incorporating technology into the teaching of trigonometry, although sparse, have demonstrated positive effects on student achievement. The optimal sequence for integrating technology with teacher-led mathematics instruction has not been determined. Our research investigated whether technology has a greater impact on student achievement…

  6. Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

    KAUST Repository

    Sayadi, Ahmed

    2011-07-20

    The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath).

  7. Prevalent RNA recognition motif duplication in the human genome.

    Science.gov (United States)

    Tsai, Yihsuan S; Gomez, Shawn M; Wang, Zefeng

    2014-05-01

    The sequence-specific recognition of RNA by proteins is mediated through various RNA binding domains, with the RNA recognition motif (RRM) being the most frequent and present in >50% of RNA-binding proteins (RBPs). Many RBPs contain multiple RRMs, and it is unclear how each RRM contributes to the binding specificity of the entire protein. We found that RRMs within the same RBP (i.e., sibling RRMs) tend to have significantly higher similarity than expected by chance. Sibling RRM pairs from RBPs shared by multiple species tend to have lower similarity than those found only in a single species, suggesting that multiple RRMs within the same protein might arise from domain duplication followed by divergence through random mutations. This finding is exemplified by a recent RRM domain duplication in DAZ proteins and an ancient duplication in PABP proteins. Additionally, we found that different similarities between sibling RRMs are associated with distinct functions of an RBP and that the RBPs tend to contain repetitive sequences with low complexity. Taken together, this study suggests that the number of RBPs with multiple RRMs has expanded in mammals and that the multiple sibling RRMs may recognize similar target motifs in a cooperative manner.

  8. Scanning sequences after Gibbs sampling to find multiple occurrences of functional elements

    Directory of Open Access Journals (Sweden)

    Landsman David

    2006-09-01

    Full Text Available Abstract Background Many DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set. Results We describe an improvement to the A-GLAM computer program, which predicts regulatory elements within DNA sequences with Gibbs sampling. The improvement adds an optional "scanning step" after Gibbs sampling. Gibbs sampling produces a position specific scoring matrix (PSSM. The new scanning step resembles an iterative PSI-BLAST search based on the PSSM. First, it assigns an "individual score" to each subsequence of appropriate length within the input sequences using the initial PSSM. Second, it computes an E-value from each individual score, to assess the agreement between the corresponding subsequence and the PSSM. Third, it permits subsequences with E-values falling below a threshold to contribute to the underlying PSSM, which is then updated using the Bayesian calculus. A-GLAM iterates its scanning step to convergence, at which point no new subsequences contribute to the PSSM. After convergence, A-GLAM reports predicted regulatory elements within each sequence in order of increasing E-values, so users have a statistical evaluation of the predicted elements in a convenient presentation. Thus, although the Gibbs sampling step in A-GLAM finds at most one regulatory element per input sequence, the scanning step can now rapidly locate further instances of the element in each sequence. Conclusion Datasets from experiments determining the binding sites of transcription factors were used to evaluate the improvement to A-GLAM. Typically, the datasets included several sequences containing multiple instances of a regulatory motif. The improvements to A-GLAM permitted it to predict the multiple instances.

  9. The MHC motif viewer: a visualization tool for MHC binding motifs

    DEFF Research Database (Denmark)

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole

    2010-01-01

    of peptides, and knowledge of their binding specificities is important for understanding differences in the immune response between individuals. Algorithms predicting which peptides bind a given MHC molecule have recently been developed with high prediction accuracy. The utility of these algorithms...... is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  10. Highly scalable Ab initio genomic motif identification

    KAUST Repository

    Marchand, Benoit

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  11. Specificity of the chromodomain Y chromosome family of chromodomains for lysine-methylated ARK(S/T) motifs.

    Science.gov (United States)

    Fischle, Wolfgang; Franz, Henriette; Jacobs, Steven A; Allis, C David; Khorasanizadeh, Sepideh

    2008-07-11

    Previous studies have shown two homologous chromodomain modules in the HP1 and Polycomb proteins exhibit discriminatory binding to related methyllysine residues (embedded in ARKS motifs) of the histone H3 tail. Methylated ARK(S/T) motifs have recently been identified in other chromatin factors (e.g. linker histone H1.4 and lysine methyltransferase G9a). These are thought to function as peripheral docking sites for the HP1 chromodomain. In vertebrates, HP1-like chromodomains are also present in the chromodomain Y chromosome (CDY) family of proteins adjacent to a putative catalytic motif. The human genome encodes three CDY family proteins, CDY, CDYL, and CDYL2. These have putative functions ranging from establishment of histone H4 acetylation during spermiogenesis to regulation of transcription co-repressor complexes. To delineate the biochemical functions of the CDY family chromodomains, we analyzed their specificity of methyllysine recognition. We detected substantial differences among these factors. The CDY chromodomain exhibits discriminatory binding to lysine-methylated ARK(S/T) motifs, whereas the CDYL2 chromodomain binds with comparable strength to multiple ARK(S/T) motifs. Interestingly, subtle amino acid changes in the CDYL chromodomain prohibit such binding interactions in vitro and in vivo. However, point mutations can rescue binding. In support of the in vitro binding properties of the chromodomains, the full-length CDY family proteins exhibit substantial variability in chromatin localization. Our studies underscore the significance of subtle sequence differences in a conserved signaling module for diverse epigenetic regulatory pathways.

  12. Why is the GMN motif conserved in the CorA/Mrs2/Alr1 superfamily of magnesium transport proteins?

    Science.gov (United States)

    Palombo, Isolde; Daley, Daniel O; Rapp, Mikaela

    2013-07-16

    Members of the CorA/Mrs2/Alr1 superfamily of transport proteins mediate magnesium uptake in all kingdoms of life. Family members have a low degree of sequence conservation but are characterized by a conserved extracellular loop. While the degree of sequence conservation in the loop deviates to some extent between family members, the GMN family signature motif is always present. Structural and functional data imply that the loop plays a central role in magnesium selectivity, and recent biochemical data suggest it is crucial for stabilizing the pentamer in the magnesium-free (putative open) conformation. In this study, we present a detailed structure-function analysis of the extracellular loop of CorA from Thermotoga maritima, which provides molecular insight into how the loop mediates these two functions. The data show that loop residues outside of the GMN motif can be substituted if they support the pentameric state, but the residues of the GMN motif are intolerant to substitution. We conclude that G(312) is absolutely required for magnesium uptake, M(313) is absolutely required for pentamer integrity in the putative open conformation, and N(314) plays a role in both of these functions. These observations suggest a molecular reason why the GMN motif is conserved throughout the CorA/Mrs2/Alr1 superfamily.

  13. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints.

    Directory of Open Access Journals (Sweden)

    Yuchun Guo

    Full Text Available An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM. GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the

  14. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints.

    Science.gov (United States)

    Guo, Yuchun; Mahony, Shaun; Gifford, David K

    2012-01-01

    An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial

  15. The Q Motif Is Involved in DNA Binding but Not ATP Binding in ChlR1 Helicase.

    Directory of Open Access Journals (Sweden)

    Hao Ding

    Full Text Available Helicases are molecular motors that couple the energy of ATP hydrolysis to the unwinding of structured DNA or RNA and chromatin remodeling. The conversion of energy derived from ATP hydrolysis into unwinding and remodeling is coordinated by seven sequence motifs (I, Ia, II, III, IV, V, and VI. The Q motif, consisting of nine amino acids (GFXXPXPIQ with an invariant glutamine (Q residue, has been identified in some, but not all helicases. Compared to the seven well-recognized conserved helicase motifs, the role of the Q motif is less acknowledged. Mutations in the human ChlR1 (DDX11 gene are associated with a unique genetic disorder known as Warsaw Breakage Syndrome, which is characterized by cellular defects in genome maintenance. To examine the roles of the Q motif in ChlR1 helicase, we performed site directed mutagenesis of glutamine to alanine at residue 23 in the Q motif of ChlR1. ChlR1 recombinant protein was overexpressed and purified from HEK293T cells. ChlR1-Q23A mutant abolished the helicase activity of ChlR1 and displayed reduced DNA binding ability. The mutant showed impaired ATPase activity but normal ATP binding. A thermal shift assay revealed that ChlR1-Q23A has a melting point value similar to ChlR1-WT. Partial proteolysis mapping demonstrated that ChlR1-WT and Q23A have a similar globular structure, although some subtle conformational differences in these two proteins are evident. Finally, we found ChlR1 exists and functions as a monomer in solution, which is different from FANCJ, in which the Q motif is involved in protein dimerization. Taken together, our results suggest that the Q motif is involved in DNA binding but not ATP binding in ChlR1 helicase.

  16. Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins.

    Directory of Open Access Journals (Sweden)

    Defne Surujon

    Full Text Available The wealth of newly obtained proteomic information affords researchers the possibility of searching for proteins of a given structure or function. Here we describe a general method for the detection of a protein domain of interest in any species for which a complete proteome exists. In particular, we apply this approach to identify histidine phosphotransfer (HPt domain-containing proteins across a range of eukaryotic species. From the sequences of known HPt domains, we created an amino acid occurrence matrix which we then used to define a conserved, probabilistic motif. Examination of various organisms either known to contain (plant and fungal species or believed to lack (mammals HPt domains established criteria by which new HPt candidates were identified and ranked. Search results using a probabilistic motif matrix compare favorably with data to be found in several commonly used protein structure/function databases: our method identified all known HPt proteins in the Arabidopsis thaliana proteome, confirmed the absence of such motifs in mice and humans, and suggests new candidate HPts in several organisms. Moreover, probabilistic motif searching can be applied more generally, in a manner both readily customized and computationally compact, to other protein domains; this utility is demonstrated by our identification of histones in a range of eukaryotic organisms.

  17. Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins.

    Science.gov (United States)

    Surujon, Defne; Ratner, David I

    2016-01-01

    The wealth of newly obtained proteomic information affords researchers the possibility of searching for proteins of a given structure or function. Here we describe a general method for the detection of a protein domain of interest in any species for which a complete proteome exists. In particular, we apply this approach to identify histidine phosphotransfer (HPt) domain-containing proteins across a range of eukaryotic species. From the sequences of known HPt domains, we created an amino acid occurrence matrix which we then used to define a conserved, probabilistic motif. Examination of various organisms either known to contain (plant and fungal species) or believed to lack (mammals) HPt domains established criteria by which new HPt candidates were identified and ranked. Search results using a probabilistic motif matrix compare favorably with data to be found in several commonly used protein structure/function databases: our method identified all known HPt proteins in the Arabidopsis thaliana proteome, confirmed the absence of such motifs in mice and humans, and suggests new candidate HPts in several organisms. Moreover, probabilistic motif searching can be applied more generally, in a manner both readily customized and computationally compact, to other protein domains; this utility is demonstrated by our identification of histones in a range of eukaryotic organisms.

  18. Motif Participation by Genes in E. coli Transcriptional Networks

    Directory of Open Access Journals (Sweden)

    Michael eMayo

    2012-09-01

    Full Text Available Motifs are patterns of recurring connections among the genes of genetic networks that occur more frequently than would be expected from randomized networks with the same degree sequence. Although the abundance of certain three-node motifs, such as the feed-forward loop, is positively correlated with a networks’ ability to tolerate moderate disruptions to gene expression, little is known regarding the connectivity of individual genes participating in multiple motifs. Using the transcriptional network of the bacterium Escherichia coli, we investigate this feature by reconstructing the distribution of genes participating in feed-forward loop motifs from its largest connected network component. We contrast these motif participation distributions with those obtained from model networks built using the preferential attachment mechanism employed by many biological and man-made networks. We report that, although some of these model networks support a motif participation distribution that appears qualitatively similar to that obtained from the bacterium Escherichia coli, the probability for a node to support a feed-forward loop motif may instead be strongly influenced by only a few master transcriptional regulators within the network. From these analyses we conclude that such master regulators may be a crucial ingredient to describe coupling among feed-forward loop motifs in transcriptional regulatory networks.

  19. BetaSearch: a new method for querying β-residue motifs

    Directory of Open Access Journals (Sweden)

    Ho Hui

    2012-07-01

    Full Text Available Abstract Background Searching for structural motifs across known protein structures can be useful for identifying unrelated proteins with similar function and characterising secondary structures such as β-sheets. This is infeasible using conventional sequence alignment because linear protein sequences do not contain spatial information. β-residue motifs are β-sheet substructures that can be represented as graphs and queried using existing graph indexing methods, however, these approaches are designed for general graphs that do not incorporate the inherent structural constraints of β-sheets and require computationally-expensive filtering and verification procedures. 3D substructure search methods, on the other hand, allow β-residue motifs to be queried in a three-dimensional context but at significant computational costs. Findings We developed a new method for querying β-residue motifs, called BetaSearch, which leverages the natural planar constraints of β-sheets by indexing them as 2D matrices, thus avoiding much of the computational complexities involved with structural and graph querying. BetaSearch exhibits faster filtering, verification, and overall query time than existing graph indexing approaches whilst producing comparable index sizes. Compared to 3D substructure search methods, BetaSearch achieves 33 and 240 times speedups over index-based and pairwise alignment-based approaches, respectively. Furthermore, we have presented case-studies to demonstrate its capability of motif matching in sequentially dissimilar proteins and described a method for using BetaSearch to predict β-strand pairing. Conclusions We have demonstrated that BetaSearch is a fast method for querying substructure motifs. The improvements in speed over existing approaches make it useful for efficiently performing high-volume exploratory querying of possible protein substructural motifs or conformations. BetaSearch was used to identify a nearly identical

  20. The seven amino acids (547-553) of rat glucocorticoid receptor required for steroid and hsp90 binding contain a functionally independent LXXLL motif that is critical for steroid binding.

    Science.gov (United States)

    Giannoukos, G; Silverstein, A M; Pratt, W B; Simons, S S

    1999-12-17

    Hsp90 association with glucocorticoid receptors (GRs) is required for steroid binding. We recently reported that seven amino acids (547-553) overlapping the amino-terminal end of the rat GR ligand-binding domain are necessary for hsp90 binding, and consequently steroid binding. The role of a LXXLL motif at the COOH terminus of this sequence has now been analyzed by determining the properties of Leu to Ser mutations in full-length GR and glutathione S-transferase chimeras. Surprisingly, these mutations decreased steroid binding capacity without altering receptor levels, steroid binding affinity, or hsp90 binding. Single mutations in the context of the full-length receptor did not affect the transcriptional activity but the double mutant (L550S/L553S) was virtually inactive. This biological inactivity was found to be due to an increased rate of steroid dissociation from the activated mutant complex. These results, coupled with those from trypsin digestion studies, suggest a model in which the GR ligand-binding domain is viewed as having a "hinged pocket," with the hinge being in the region of the trypsin digestion site at Arg(651). The pocket would normally be kept shut via the intramolecular interactions of the LXXLL motif at amino acids 550-554 acting as a hydrophobic clasp.

  1. Repetitive sequences in plant nuclear DNA: types, distribution, evolution and function.

    Science.gov (United States)

    Mehrotra, Shweta; Goyal, Vinod

    2014-08-01

    Repetitive DNA sequences are a major component of eukaryotic genomes and may account for up to 90% of the genome size. They can be divided into minisatellite, microsatellite and satellite sequences. Satellite DNA sequences are considered to be a fast-evolving component of eukaryotic genomes, comprising tandemly-arrayed, highly-repetitive and highly-conserved monomer sequences. The monomer unit of satellite DNA is 150-400 base pairs (bp) in length. Repetitive sequences may be species- or genus-specific, and may be centromeric or subtelomeric in nature. They exhibit cohesive and concerted evolution caused by molecular drive, leading to high sequence homogeneity. Repetitive sequences accumulate variations in sequence and copy number during evolution, hence they are important tools for taxonomic and phylogenetic studies, and are known as "tuning knobs" in the evolution. Therefore, knowledge of repetitive sequences assists our understanding of the organization, evolution and behavior of eukaryotic genomes. Repetitive sequences have cytoplasmic, cellular and developmental effects and play a role in chromosomal recombination. In the post-genomics era, with the introduction of next-generation sequencing technology, it is possible to evaluate complex genomes for analyzing repetitive sequences and deciphering the yet unknown functional potential of repetitive sequences. Copyright © 2014 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  2. MotifCombinator: a web-based tool to search for combinations of cis-regulatory motifs

    Directory of Open Access Journals (Sweden)

    Tsunoda Tatsuhiko

    2007-03-01

    Full Text Available Abstract Background A combination of multiple types of transcription factors and cis-regulatory elements is often required for gene expression in eukaryotes, and the combinatorial regulation confers specific gene expression to tissues or environments. To reveal the combinatorial regulation, computational methods are developed that efficiently infer combinations of cis-regulatory motifs that are important for gene expression as measured by DNA microarrays. One promising type of computational method is to utilize regression analysis between expression levels and scores of motifs in input sequences. This type takes full advantage of information on expression levels because it does not require that the expression level of each gene be dichotomized according to whether or not it reaches a certain threshold level. However, there is no web-based tool that employs regression methods to systematically search for motif combinations and that practically handles combinations of more than two or three motifs. Results We here introduced MotifCombinator, an online tool with a user-friendly interface, to systematically search for combinations composed of any number of motifs based on regression methods. The tool utilizes well-known regression methods (the multivariate linear regression, the multivariate adaptive regression spline or MARS, and the multivariate logistic regression method for this purpose, and uses the genetic algorithm to search for combinations composed of any desired number of motifs. The visualization systems in this tool help users to intuitively grasp the process of the combination search, and the backup system allows users to easily stop and restart calculations that are expected to require large computational time. This tool also provides preparatory steps needed for systematic combination search – i.e., selecting single motifs to constitute combinations and cutting out redundant similar motifs based on clustering analysis. Conclusion

  3. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    Science.gov (United States)

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods.

  4. A three-dimensional RNA motif in Potato spindle tuber viroid mediates trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana.

    Science.gov (United States)

    Takeda, Ryuta; Petrov, Anton I; Leontis, Neocles B; Ding, Biao

    2011-01-01

    Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5'-CGA-3'...5'-GAC-3' flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes.

  5. Characterisation of an unusual telomere motif (TTTTTTAGGG)n in the plant Cestrum elegans (Solanaceae), a species with a large genome.

    Science.gov (United States)

    Peška, Vratislav; Fajkus, Petr; Fojtová, Miloslava; Dvořáčková, Martina; Hapala, Jan; Dvořáček, Vojtěch; Polanská, Pavla; Leitch, Andrew R; Sýkorová, Eva; Fajkus, Jiří

    2015-05-01

    The characterization of unusual telomere sequence sheds light on patterns of telomere evolution, maintenance and function. Plant species from the closely related genera Cestrum, Vestia and Sessea (family Solanaceae) lack known plant telomeric sequences. Here we characterize the telomere of Cestrum elegans, work that was a challenge because of its large genome size and few chromosomes (1C 9.76 pg; n = 8). We developed an approach that combines BAL31 digestion, which digests DNA from the ends and chromosome breaks, with next-generation sequencing (NGS), to generate data analysed in RepeatExplorer, designed for de novo repeats identification and quantification. We identify an unique repeat motif (TTTTTTAGGG)n in C. elegans, occurring in ca. 30 400 copies per haploid genome, averaging ca. 1900 copies per telomere, and synthesized by telomerase. We demonstrate that the motif is synthesized by telomerase. The occurrence of an unusual eukaryote (TTTTTTAGGG)n telomeric motif in C. elegans represents a switch in motif from the 'typical' angiosperm telomere (TTTAGGG)n . That switch may have happened with the divergence of Cestrum, Sessea and Vestia. The shift in motif when it arose would have had profound effects on telomere activity. Thus our finding provides a unique handle to study how telomerase and telomeres responded to genetic change, studies that will shed more light on telomere function.

  6. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes.

    Science.gov (United States)

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved.

  7. Motif-guided sparse decomposition of gene expression data for regulatory module identification

    Directory of Open Access Journals (Sweden)

    Hoffman Eric P

    2011-03-01

    Full Text Available Abstract Background Genes work coordinately as gene modules or gene networks. Various computational approaches have been proposed to find gene modules based on gene expression data; for example, gene clustering is a popular method for grouping genes with similar gene expression patterns. However, traditional gene clustering often yields unsatisfactory results for regulatory module identification because the resulting gene clusters are co-expressed but not necessarily co-regulated. Results We propose a novel approach, motif-guided sparse decomposition (mSD, to identify gene regulatory modules by integrating gene expression data and DNA sequence motif information. The mSD approach is implemented as a two-step algorithm comprising estimates of (1 transcription factor activity and (2 the strength of the predicted gene regulation event(s. Specifically, a motif-guided clustering method is first developed to estimate the transcription factor activity of a gene module; sparse component analysis is then applied to estimate the regulation strength, and so predict the target genes of the transcription factors. The mSD approach was first tested for its improved performance in finding regulatory modules using simulated and real yeast data, revealing functionally distinct gene modules enriched with biologically validated transcription factors. We then demonstrated the efficacy of the mSD approach on breast cancer cell line data and uncovered several important gene regulatory modules related to endocrine therapy of breast cancer. Conclusion We have developed a new integrated strategy, namely motif-guided sparse decomposition (mSD of gene expression data, for regulatory module identification. The mSD method features a novel motif-guided clustering method for transcription factor activity estimation by finding a balance between co-regulation and co-expression. The mSD method further utilizes a sparse decomposition method for regulation strength estimation. The

  8. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    Science.gov (United States)

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  9. A minimum of three motifs is essential for optimal binding of pseudomurein cell wall-binding domain of Methanothermobacter thermautotrophicus.

    Directory of Open Access Journals (Sweden)

    Ganesh Ram R Visweswaran

    Full Text Available We have biochemically and functionally characterized the pseudomurein cell wall-binding (PMB domain that is present at the C-terminus of the Surface (S-layer protein MTH719 from Methanothermobacter thermautotrophicus. Chemical denaturation of the protein with guanidinium hydrochloride occurred at 3.8 M. A PMB-GFP fusion protein not only binds to intact pseudomurein of methanogenic archaea, but also to spheroplasts of lysozyme-treated bacterial cells. This binding is pH dependent. At least two of the three motifs that are present in the domain are necessary for binding. Limited proteolysis revealed a possible cleavage site in the spacing sequence between motifs 1 and 2 of the PMB domain, indicating that the motif region itself is protected from proteases.

  10. Social Network Analysis Based on Network Motifs

    OpenAIRE

    2014-01-01

    Based on the community structure characteristics, theory, and methods of frequent subgraph mining, network motifs findings are firstly introduced into social network analysis; the tendentiousness evaluation function and the importance evaluation function are proposed for effectiveness assessment. Compared with the traditional way based on nodes centrality degree, the new approach can be used to analyze the properties of social network more fully and judge the roles of the nodes effectively. I...

  11. Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes.

    Science.gov (United States)

    Pujato, Mario; Kieken, Fabien; Skiles, Amanda A; Tapinos, Nikos; Fiser, Andras

    2014-12-16

    Proper cell functioning depends on the precise spatio-temporal expression of its genetic material. Gene expression is controlled to a great extent by sequence-specific transcription factors (TFs). Our current knowledge on where and how TFs bind and associate to regulate gene expression is incomplete. A structure-based computational algorithm (TF2DNA) is developed to identify binding specificities of TFs. The method constructs homology models of TFs bound to DNA and assesses the relative binding affinity for all possible DNA sequences using a knowledge-based potential, after optimization in a molecular mechanics force field. TF2DNA predictions were benchmarked against experimentally determined binding motifs. Success rates range from 45% to 81% and primarily depend on the sequence identity of aligned target sequences and template structures, TF2DNA was used to predict 1321 motifs for 1825 putative human TF proteins, facilitating the reconstruction of most of the human gene regulatory network. As an illustration, the predicted DNA binding site for the poorly characterized T-cell leukemia homeobox 3 (TLX3) TF was confirmed with gel shift assay experiments. TLX3 motif searches in human promoter regions identified a group of genes enriched in functions relating to hematopoiesis, tissue morphology, endocrine system and connective tissue development and function. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    Science.gov (United States)

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  13. HeliCis: a DNA motif discovery tool for colocalized motif pairs with periodic spacing

    Directory of Open Access Journals (Sweden)

    Mostad Petter

    2007-10-01

    Full Text Available Abstract Background Correct temporal and spatial gene expression during metazoan development relies on combinatorial interactions between different transcription factors. As a consequence, cis-regulatory elements often colocalize in clusters termed cis-regulatory modules. These may have requirements on organizational features such as spacing, order and helical phasing (periodic spacing between binding sites. Due to the turning of the DNA helix, a small modification of the distance between a pair of sites may sometimes drastically disrupt function, while insertion of a full helical turn of DNA (10–11 bp between cis elements may cause functionality to be restored. Recently, de novo motif discovery methods which incorporate organizational properties such as colocalization and order preferences have been developed, but there are no tools which incorporate periodic spacing into the model. Results We have developed a web based motif discovery tool, HeliCis, which features a flexible model which allows de novo detection of motifs with periodic spacing. Depending on the parameter settings it may also be used for discovering colocalized motifs without periodicity or motifs separated by a fixed gap of known or unknown length. We show on simulated data that it can efficiently capture the synergistic effects of colocalization and periodic spacing to improve detection of weak DNA motifs. It provides a simple to use web interface which interactively visualizes the current settings and thereby makes it easy to understand the parameters and the model structure. Conclusion HeliCis provides simple and efficient de novo discovery of colocalized DNA motif pairs, with or without periodic spacing. Our evaluations show that it can detect weak periodic patterns which are not easily discovered using a sequential approach, i.e. first finding the binding sites and second analyzing the properties of their pairwise distances.

  14. MEME-ChIP: motif analysis of large DNA datasets.

    Science.gov (United States)

    Machanick, Philip; Bailey, Timothy L

    2011-06-15

    Advances in high-throughput sequencing have resulted in rapid growth in large, high-quality datasets including those arising from transcription factor (TF) ChIP-seq experiments. While there are many existing tools for discovering TF binding site motifs in such datasets, most web-based tools cannot directly process such large datasets. The MEME-ChIP web service is designed to analyze ChIP-seq 'peak regions'--short genomic regions surrounding declared ChIP-seq 'peaks'. Given a set of genomic regions, it performs (i) ab initio motif discovery, (ii) motif enrichment analysis, (iii) motif visualization, (iv) binding affinity analysis and (v) motif identification. It runs two complementary motif discovery algorithms on the input data--MEME and DREME--and uses the motifs they discover in subsequent visualization, binding affinity and identification steps. MEME-ChIP also performs motif enrichment analysis using the AME algorithm, which can detect very low levels of enrichment of binding sites for TFs with known DNA-binding motifs. Importantly, unlike with the MEME web service, there is no restriction on the size or number of uploaded sequences, allowing very large ChIP-seq datasets to be analyzed. The analyses performed by MEME-ChIP provide the user with a varied view of the binding and regulatory activity of the ChIP-ed TF, as well as the possible involvement of other DNA-binding TFs. MEME-ChIP is available as part of the MEME Suite at http://meme.nbcr.net.

  15. Scoring protein relationships in functional interaction networks predicted from sequence data.

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    Full Text Available UNLABELLED: The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. AVAILABILITY: Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes.

  16. Discovering large network motifs from a complex biological network

    Energy Technology Data Exchange (ETDEWEB)

    Terada, Aika; Sese, Jun, E-mail: terada@sel.is.ocha.ac.j, E-mail: sesejun@is.ocha.ac.j [Department of Computer Science, Ochanomizu University, 2-1-1 Ohtsuka, Bunkyo-ku, Tokyo 112-8610 (Japan)

    2009-12-01

    Graph structures representing relationships between entries have been studied in statistical analysis, and the results of these studies have been applied to biological networks, whose nodes and edges represent proteins and the relationships between them, respectively. Most of the studies have focused on only graph structures such as scale-free properties and cliques, but the relationships between nodes are also important features since most of the proteins perform their functions by connecting to other proteins. In order to determine such relationships, the problem of network motif discovery has been addressed; network motifs are frequently appearing graph structures in a given graph. However, the methods for network motif discovery are highly restrictive for the application to biological network because they can only be used to find small network motifs or they do not consider noise and uncertainty in observations. In this study, we introduce a new index to measure network motifs called AR index and develop a novel algorithm called ARIANA for finding large motifs even when the network has noise. Experiments using a synthetic network verify that our method can find better network motifs than an existing algorithm. By applying ARIANA to a real complex biological network, we find network motifs associated with regulations of start time of cell functions and generation of cell energies and discover that the cell cycle proteins can be categorized into two different groups.

  17. Coregulator control of androgen receptor action by a novel nuclear receptor-binding motif.

    Science.gov (United States)

    Jehle, Katja; Cato, Laura; Neeb, Antje; Muhle-Goll, Claudia; Jung, Nicole; Smith, Emmanuel W; Buzon, Victor; Carbó, Laia R; Estébanez-Perpiñá, Eva; Schmitz, Katja; Fruk, Ljiljana; Luy, Burkhard; Chen, Yu; Cox, Marc B; Bräse, Stefan; Brown, Myles; Cato, Andrew C B

    2014-03-28

    The androgen receptor (AR) is a ligand-activated transcription factor that is essential for prostate cancer development. It is activated by androgens through its ligand-binding domain (LBD), which consists predominantly of 11 α-helices. Upon ligand binding, the last helix is reorganized to an agonist conformation termed activator function-2 (AF-2) for coactivator binding. Several coactivators bind to the AF-2 pocket through conserved LXXLL or FXXLF sequences to enhance the activity of the receptor. Recently, a small compound-binding surface adjacent to AF-2 has been identified as an allosteric modulator of the AF-2 activity and is termed binding function-3 (BF-3). However, the role of BF-3 in vivo is currently unknown, and little is understood about what proteins can bind to it. Here we demonstrate that a duplicated GARRPR motif at the N terminus of the cochaperone Bag-1L functions through the BF-3 pocket. These findings are supported by the fact that a selective BF-3 inhibitor or mutations within the BF-3 pocket abolish the interaction between the GARRPR motif(s) and the BF-3. Conversely, amino acid exchanges in the two GARRPR motifs of Bag-1L can impair the interaction between Bag-1L and AR without altering the ability of Bag-1L to bind to chromatin. Furthermore, the mutant Bag-1L increases androgen-dependent activation of a subset of AR targets in a genome-wide transcriptome analysis, demonstrating a repressive function of the GARRPR/BF-3 interaction. We have therefore identified GARRPR as a novel BF-3 regulatory sequence important for fine-tuning the activity of the AR.

  18. Distinct functional constraints partition sequence conservation in a cis-regulatory element.

    Directory of Open Access Journals (Sweden)

    Antoine Barrière

    2011-06-01

    Full Text Available Different functional constraints contribute to different evolutionary rates across genomes. To understand why some sequences evolve faster than others in a single cis-regulatory locus, we investigated function and evolutionary dynamics of the promoter of the Caenorhabditis elegans unc-47 gene. We found that this promoter consists of two distinct domains. The proximal promoter is conserved and is largely sufficient to direct appropriate spatial expression. The distal promoter displays little if any conservation between several closely related nematodes. Despite this divergence, sequences from all species confer robustness of expression, arguing that this function does not require substantial sequence conservation. We showed that even unrelated sequences have the ability to promote robust expression. A prominent feature shared by all of these robustness-promoting sequences is an AT-enriched nucleotide composition consistent with nucleosome depletion. Because general sequence composition can be maintained despite sequence turnover, our results explain how different functional constraints can lead to vastly disparate rates of sequence divergence within a promoter.

  19. Therapeutic modulation of endogenous gene function by agents with designed DNA-sequence specificities

    NARCIS (Netherlands)

    Uil, T.G.; Haisma, H.J.; Rots, Marianne

    2003-01-01

    Designer molecules that can specifically target pre-determined DNA sequences provide a means to modulate endogenous gene function. Different classes of sequence-specific DNA-binding agents have been developed, including triplex-forming molecules, synthetic polyamides and designer zinc finger protein

  20. The heptanucleotide motif GAGACGC is a key component of a cis-acting promoter element that is critical for SnSAG1 expression in Sarcocystis neurona.

    Science.gov (United States)

    Gaji, Rajshekhar Y; Howe, Daniel K

    2009-07-01

    The apicomplexan parasite Sarcocystis neurona undergoes a complex process of intracellular development, during which many genes are temporally regulated. The described study was undertaken to begin identifying the basic promoter elements that control gene expression in S. neurona. Sequence analysis of the 5'-flanking region of five S. neurona genes revealed a conserved heptanucleotide motif GAGACGC that is similar to the WGAGACG motif described upstream of multiple genes in Toxoplasma gondii. The promoter region for the major surface antigen gene SnSAG1, which contains three heptanucleotide motifs within 135 bases of the transcription start site, was dissected by functional analysis using a dual luciferase reporter assay. These analyses revealed that a minimal promoter fragment containing all three motifs was sufficient to drive reporter molecule expression, with the presence and orientation of the 5'-most heptanucleotide motif being absolutely critical for promoter function. Further studies should help to identify additional sequence elements important for promoter function and for controlling gene expression during intracellular development by this apicomplexan pathogen.

  1. DNA nanotechnology based on i-motif structures.

    Science.gov (United States)

    Dong, Yuanchen; Yang, Zhongqiang; Liu, Dongsheng

    2014-06-17

    CONSPECTUS: Most biological processes happen at the nanometer scale, and understanding the energy transformations and material transportation mechanisms within living organisms has proved challenging. To better understand the secrets of life, researchers have investigated artificial molecular motors and devices over the past decade because such systems can mimic certain biological processes. DNA nanotechnology based on i-motif structures is one system that has played an important role in these investigations. In this Account, we summarize recent advances in functional DNA nanotechnology based on i-motif structures. The i-motif is a DNA quadruplex that occurs as four stretches of cytosine repeat sequences form C·CH(+) base pairs, and their stabilization requires slightly acidic conditions. This unique property has produced the first DNA molecular motor driven by pH changes. The motor is reliable, and studies show that it is capable of millisecond running speeds, comparable to the speed of natural protein motors. With careful design, the output of these types of motors was combined to drive micrometer-sized cantilevers bend. Using established DNA nanostructure assembly and functionalization methods, researchers can easily integrate the motor within other DNA assembled structures and functional units, producing DNA molecular devices with new functions such as suprahydrophobic/suprahydrophilic smart surfaces that switch, intelligent nanopores triggered by pH changes, molecular logic gates, and DNA nanosprings. Recently, researchers have produced motors driven by light and electricity, which have allowed DNA motors to be integrated within silicon-based nanodevices. Moreover, some devices based on i-motif structures have proven useful for investigating processes within living cells. The pH-responsiveness of the i-motif structure also provides a way to control the stepwise assembly of DNA nanostructures. In addition, because of the stability of the i-motif, this

  2. The extended AT-hook is a novel RNA binding motif.

    Science.gov (United States)

    Filarsky, Michael; Zillner, Karina; Araya, Ingrid; Villar-Garea, Ana; Merkl, Rainer; Längst, Gernot; Németh, Attila

    2015-01-01

    The AT-hook has been defined as a DNA binding peptide motif that contains a glycine-arginine-proline (G-R-P) tripeptide core flanked by basic amino acids. Recent reports documented variations in the sequence of AT-hooks and revealed RNA binding activity of some canonical AT-hooks, suggesting a higher structural and functional variability of this protein domain than previously anticipated. Here we describe the discovery and characterization of the extended AT-hook peptide motif (eAT-hook), in which basic amino acids appear symmetrical mainly at a distance of 12-15 amino acids from the G-R-P core. We identified 80 human and 60 mouse eAT-hook proteins and biochemically characterized the eAT-hooks of Tip5/BAZ2A, PTOV1 and GPBP1. Microscale thermophoresis and electrophoretic mobility shift assays reveal the nucleic acid binding features of this peptide motif, and show that eAT-hooks bind RNA with one order of magnitude higher affinity than DNA. In addition, cellular localization studies suggest a role for the N-terminal eAT-hook of PTOV1 in nucleocytoplasmic shuttling. In summary, our findings classify the eAT-hook as a novel nucleic acid binding motif, which potentially mediates various RNA-dependent cellular processes.

  3. Coding and decoding libraries of sequence-defined functional copolymers synthesized via photoligation

    Science.gov (United States)

    Zydziak, Nicolas; Konrad, Waldemar; Feist, Florian; Afonin, Sergii; Weidner, Steffen; Barner-Kowollik, Christopher

    2016-11-01

    Designing artificial macromolecules with absolute sequence order represents a considerable challenge. Here we report an advanced light-induced avenue to monodisperse sequence-defined functional linear macromolecules up to decamers via a unique photochemical approach. The versatility of the synthetic strategy--combining sequential and modular concepts--enables the synthesis of perfect macromolecules varying in chemical constitution and topology. Specific functions are placed at arbitrary positions along the chain via the successive addition of monomer units and blocks, leading to a library of functional homopolymers, alternating copolymers and block copolymers. The in-depth characterization of each sequence-defined chain confirms the precision nature of the macromolecules. Decoding of the functional information contained in the molecular structure is achieved via tandem mass spectrometry without recourse to their synthetic history, showing that the sequence information can be read. We submit that the presented photochemical strategy is a viable and advanced concept for coding individual monomer units along a macromolecular chain.

  4. Semi-automatic time-series transfer functions via temporal clustering and sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Woodring, Jonathan L [Los Alamos National Laboratory; Shen, H W [OHIO STATE UNIV.

    2009-01-01

    When creating transfer functions for time-varying data, it is not clear what range of values to use for classification, as data value ranges and distributions change over time. In order to generate time-varying transfer functions, they search the data for classes that have similar behavior over time, assuming that data points that behave similarly belong to the same feature. They utilize a method they call temporal clustering and sequencing to find dynamic features in value space and create a corresponding transfer function. First, clustering finds groups of data points that have the same value space activity over time. Then, sequencing derives a progression of clusters over time, creating chains that follow value distribution changes. Finally, the cluster sequences are used to create transfer functions, as sequences describe the value range distributions over time in a data set.

  5. Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family

    Directory of Open Access Journals (Sweden)

    Eskandari Sepehr

    2007-10-01

    Full Text Available Abstract Background Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS family is an ideal model system to assess the quality of our predictions. Results The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores. A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further

  6. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    Directory of Open Access Journals (Sweden)

    Johannes Eichner

    Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  7. A simple motif for protein recognition in DNA secondary structures.

    Science.gov (United States)

    Landt, Stephen G; Ramirez, Alejandro; Daugherty, Matthew D; Frankel, Alan D

    2005-09-02

    DNA in a single-stranded form (ssDNA) exists transiently within the cell and comprises the telomeres of linear chromosomes and the genomes of some DNA viruses. As with RNA, in the single-stranded state, some DNA sequences are able to fold into complex secondary and tertiary structures that may be recognized by proteins and participate in gene regulation. To better understand how such DNA elements might fold and interact with proteins, and to compare recognition features to those of a structured RNA, we used in vitro selection to identify ssDNAs that bind an RNA-binding peptide from the HIV Rev protein with high affinity and specificity. The large majority of selected binders contain a non-Watson-Crick G.T base-pair and an adjacent C:G base-pair and both are essential for binding. This GT motif can be presented in different DNA contexts, including a nearly perfect duplex and a branched three-helix structure, and appears to be recognized in large part by arginine residues separated by one turn of an alpha-helix. Interestingly, a very similar GT motif is necessary also for protein binding and function of a well-characterized model ssDNA regulatory element from the proenkephalin promoter.

  8. A combinatorial code for splicing silencing: UAGG and GGGG motifs

    National Research Council Canada - National Science Library

    Han, Kyoungha; Yeo, Gene; An, Ping; Burge, Christopher B; Grabowski, Paula J

    2005-01-01

    .... Here we use molecular approaches to identify a ternary combination of exonic UAGG and 5'-splice-site-proximal GGGG motifs that functions cooperatively to silence the brain-region-specific CI cassette exon (exon 19...

  9. [Specific motifs in the genomes of the family Chlamydiaceae].

    Science.gov (United States)

    Demkin, V V; Kirillova, N V

    2012-01-01

    Specific motifs in the genomes of the family Chlamydiaceae were discussed. The search for genetic markers ofbacteria identification and typing is an urgent problem. The progress in sequencing technology resulted in compilation of the database of genomic nucleotide sequences of bacteria. This raised the problem of the search and selection of genetic targets for identification and typing in bacterial genes based on comparative analysis of complete genomic sequences. The goal of this work was to implement comparative genetic analysis of different species of the family Chlamydiaceae. This analysis was focused to detection of specific motifs capable of serving as genetic marker of this family. The consensus domains were detected using the Visual Basic for Application software for MS Excel. Complete coincidence of segments 25 nucleotide long was used as the test for consensus domain selection. One complete genomic sequence for each of 8 bacterial species was taken for the experiment. The experimental sample did not contain complete sequence of C. suis, because at the moment of this research this species was absence in the database GenBank. Comparative assay of the sequences of the C. trachomatis and other representatives of the family Chlamydiaceae revealed 41 common motifs for 8 Chlamydiaceae species tested in this work. The maximal number of consensus motifs was observed in genes of ribosomal RNA and t-RNA. In addition to genes of r-RNA and t-RNA consensus motifs were observed in 5 genes and 6 intergene segments. The gene CTL0299, CTLO800, dagA, and hctA consensus motifs detected in this work can be regarded as identification domains of the family Chlamydiaceae.

  10. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  11. Efficient motif finding algorithms for large-alphabet inputs

    Directory of Open Access Journals (Sweden)

    Pavlovic Vladimir

    2010-10-01

    Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

  12. A 20 Residues Motif Delineates the Furin Cleavage Site and its Physical Properties May Influence Viral Fusion

    Directory of Open Access Journals (Sweden)

    Sun Tian

    2009-01-01

    Full Text Available Furin is a proprotein convertase that proteolytically cleaves protein precursors to yield functional proteins. Efficient cleavage depends on the presence of a specific sequence motif on the substrate. Currently, the cleavage site motif is described as a four amino acid pattern: R-X-[K/R]-R↓. However, not all furin cleavage recognition sites can be described by this pattern and not all R-X-[K/R]-R↓ sites are cleaved by furin. Since many furin substrates are involved in the pathogenesis of viral infection and human diseases, it is important to accurately characterize the furin cleavage site motif. In this study, the furin cleavage site motif was characterized using statistical analysis. The data were interpreted within the 3D crystal structure of the furin catalytic domain. The results indicate that the furin cleavage site motif is comprised of about 20 residues, P14–P6´. Specific physical properties such as volume, charge, and hydrophilicity are required at specific positions. The furin cleavage site motif is divided into two parts: 1 one core region (8 amino acids, positions P6–P2´ packed inside the furin binding pocket; 2 two polar regions (8 amino acids, positions P7–P14; and 4 amino acids, positions P3´–P6´ located outside the furin binding pocket. The physical properties of the core region contribute to the binding strength of the furin substrate, while the polar regions provide a solvent accessible environment and facilitate the accessibility of the core region to the furin binding pocket. This furin cleavage site motif also revealed a dynamic relationship linking the evolution of physical properties in region P1´–P6´ of viral fusion peptides, furin cleavage efficacy, and viral infectivity.

  13. Differential evolutionary conservation of motif modes in the yeast protein interaction network

    Directory of Open Access Journals (Sweden)

    Yu Chang-Yung

    2006-04-01

    Full Text Available Abstract Background The importance of a network motif (a recurring interconnected pattern of special topology which is over-represented in a biological network lies in its position in the hierarchy between the protein molecule and the module in a protein-protein interaction network. Until now, however, the methods available have greatly restricted the scope of research. While they have focused on the analysis in the resolution of a motif topology, they have not been able to distinguish particular motifs of the same topology in a protein-protein interaction network. Results We have been able to assign the molecular function annotations of Gene Ontology to each protein in the protein-protein interactions of Saccharomyces cerevisiae. For various motif topologies, we have developed an algorithm, enabling us to unveil one million "motif modes", each of which features a unique topological combination of molecular functions. To our surprise, the conservation ratio, i.e., the extent of the evolutionary constraints upon the motif modes of the same motif topology, varies significantly, clearly indicative of distinct differences in the evolutionary constraints upon motifs of the same motif topology. Equally important, for all motif modes, we have found a power-law distribution of the motif counts on each motif mode. We postulate that motif modes may very well represent the evolutionary-conserved topological units of a protein interaction network. Conclusion For the first time, the motifs of a protein interaction network have been investigated beyond the scope of motif topology. The motif modes determined in this study have not only enabled us to differentiate among different evolutionary constraints on motifs of the same topology but have also opened up new avenues through which protein interaction networks can be analyzed.

  14. Immunity related genes in dipterans share common enrichment of AT-rich motifs in their 5' regulatory regions that are potentially involved in nucleosome formation

    Directory of Open Access Journals (Sweden)

    Rodriguez Mario H

    2008-07-01

    Full Text Available Abstract Background Understanding the transcriptional regulation mechanisms in response to environmental challenges is of fundamental importance in biology. Transcription factors associated to response elements and the chromatin structure had proven to play important roles in gene expression regulation. We have analyzed promoter regions of dipteran genes induced in response to immune challenge, in search for particular sequence patterns involved in their transcriptional regulation. Results 5' upstream regions of D. melanogaster and A. gambiae immunity-induced genes and their corresponding orthologous genes in 11 non-melanogaster drosophilid species and Ae. aegypti share enrichment in AT-rich short motifs. AT-rich motifs are associated with nucleosome formation as predicted by two different algorithms. In A. gambiae and D. melanogaster, many immunity genes 5' upstream sequences also showed NFκB response elements, located within 500 bp from the transcription start site. In A. gambiae, the frequency of ATAA motif near the NFκB response elements was increased, suggesting a functional link between nucleosome formation/remodelling and NFκB regulation of transcription. Conclusion AT-rich motif enrichment in 5' upstream sequences in A. gambiae, Ae. aegypti and the Drosophila genus immunity genes suggests a particular pattern of nucleosome formation/chromatin organization. The co-occurrence of such motifs with the NFκB response elements suggests that these sequence signatures may be functionally involved in transcriptional activation during dipteran immune response. AT-rich motif enrichment in regulatory regions in this group of co-regulated genes could represent an evolutionary constrained signature in dipterans and perhaps other distantly species.

  15. C. elegans RNA-binding protein GLD-1 recognizes its multiple targets using sequence, context, and structural information to repress translation.

    Science.gov (United States)

    Doh, Jung H; Jung, Yuchae; Reinke, Valerie; Lee, Min-Ho

    2013-10-01

    Caenorhabditis elegans GLD-1, a maxi-KH motif containing RNA-binding protein, has various functions mainly during female germ cell development, suggesting that it likely controls the expression of a selective group of maternal mRNAs. To gain an insight into how GLD-1 specifically recognizes these mRNA targets, we identified 38 biochemically proven GLD-1 binding regions from multiple mRNA targets that are among over 100 putative targets co-immunoprecipitated with GLD-1. The sequence information of these regions revealed three over-represented and phylogenetically conserved sequence motifs. We found that two of the motifs, one of which is novel, are important for GLD-1 binding in several GLD-1 binding regions but not in other regions. Further analyses indicate that the importance of one of the sequence motifs is dependent on two aspects: (1) surrounding sequence information, likely acting as an accessory feature for GLD-1 to efficiently select the sequence motif and (2) RNA secondary structural environment where the sequence motif resides, which likely provides "binding-site accessibility" for GLD-1 to effectively recognize its targets. Our data suggest some mRNAs recruit GLD-1 by a distinct mechanism, which involves more than one sequence motif that needs to be embedded in the correct context and structural environment.

  16. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Science.gov (United States)

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-01

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683

  17. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Directory of Open Access Journals (Sweden)

    Graziele Pereira Oliveira

    2017-01-01

    Full Text Available For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV, raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’ that could be evolved gradually by nucleotides’ gain and loss and point mutations.

  18. Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S--a novel motif scan algorithm with optional secondary structure constraints.

    Science.gov (United States)

    Niv, Masha Y; Skrabanek, Lucy; Roberts, Richard J; Scheraga, Harold A; Weinstein, Harel

    2008-05-01

    Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.

  19. Discovery of widespread GTP-binding motifs in genomic DNA and RNA.

    Science.gov (United States)

    Curtis, Edward A; Liu, David R

    2013-04-18

    Biological RNAs that bind small molecules have been implicated in a variety of regulatory and catalytic processes. Inspired by these examples, we used in vitro selection to search a pool of genome-encoded RNA fragments for naturally occurring GTP aptamers. Several aptamer classes were identified, including one (the "G motif") with a G-quadruplex structure. Further analysis revealed that most RNA and DNA G-quadruplexes bind GTP. The G motif is abundant in eukaryotes, and the human genome contains ~75,000 examples with dissociation constants comparable to the GTP concentration of a eukaryotic cell (~300 μM). G-quadruplexes play roles in diverse cellular processes, and our findings raise the possibility that GTP may play a role in the function of these elements. Consistent with this possibility, the sequence requirements of several classes of regulatory G-quadruplexes parallel those of GTP binding.

  20. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-05-01

    The purpose of this dissertation is to present a methodology to model global sequence alignment problem as directed acyclic graph which helps to extract all possible optimal alignments. Moreover, a mechanism to sequentially optimize sequence alignment problem relative to different cost functions is suggested. Sequence alignment is mostly important in computational biology. It is used to find evolutionary relationships between biological sequences. There are many algo- rithms that have been developed to solve this problem. The most famous algorithms are Needleman-Wunsch and Smith-Waterman that are based on dynamic program- ming. In dynamic programming, problem is divided into a set of overlapping sub- problems and then the solution of each subproblem is found. Finally, the solutions to these subproblems are combined into a final solution. In this thesis it has been proved that for two sequences of length m and n over a fixed alphabet, the suggested optimization procedure requires O(mn) arithmetic operations per cost function on a single processor machine. The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  1. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences

    Directory of Open Access Journals (Sweden)

    Meinicke Peter

    2009-09-01

    Full Text Available Abstract Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  2. Multilayer motif analysis of brain networks

    CERN Document Server

    Battiston, Federico; Chavez, Mario; Latora, Vito

    2016-01-01

    In the last decade network science has shed new light on the anatomical connectivity and on correlations in the activity of different areas of the human brain. The study of brain networks has made possible in fact to detect the central areas of a neural system, and to identify its building blocks by looking at overabundant small subgraphs, known as motifs. However, network analysis of the brain has so far mainly focused on structural and functional networks as separate entities. The recently developed mathematical framework of multi-layer networks allows to perform a multiplex analysis of the human brain where the structural and functional layers are considered at the same time. In this work we describe how to classify subgraphs in multiplex networks, and we extend motif analysis to networks with many layers. We then extract multi-layer motifs in brain networks of healthy subjects by considering networks with two layers, respectively obtained from diffusion and functional magnetic resonance imaging. Results i...

  3. Binding properties of SUMO-interacting motifs (SIMs) in yeast.

    Science.gov (United States)

    Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

    2015-03-01

    Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.

  4. Evaluation of diverse peptidyl motifs for cellular delivery of semiconductor quantum dots.

    Science.gov (United States)

    Gemmill, Kelly Boeneman; Muttenthaler, Markus; Delehanty, James B; Stewart, Michael H; Susumu, Kimihiro; Dawson, Philip E; Medintz, Igor L

    2013-07-01

    Cell-penetrating peptides (CPPs) have rapidly become a mainstay technology for facilitating the delivery of a wide variety of nanomaterials to cells and tissues. Currently, the library of CPPs to choose from is still limited, with the HIV TAT-derived motif still being the most used. Among the many materials routinely delivered by CPPs, nanoparticles are of particular interest for a plethora of labeling, imaging, sensing, diagnostic, and therapeutic applications. The development of nanoparticle-based technologies for many of these uses will require access to a much larger number of functional peptide motifs that can both facilitate cellular delivery of different types of nanoparticles to cells and be used interchangeably in the presence of other peptides and proteins on the same surface. Here, we evaluate the utility of four peptidyl motifs for their ability to facilitate delivery of luminescent semiconductor quantum dots (QDs) in a model cell culture system. We find that an LAH4 motif, derived from a membrane-inserting antimicrobial peptide, and a chimeric sequence that combines a sweet arrow peptide with a portion originating from the superoxide dismutase enzyme provide effective cellular delivery of QDs. Interestingly, a derivative of the latter sequence lacking just a methyl group was found to be quite inefficient, suggesting that even small changes can have significant functional outcomes. Delivery was effected using 1 h incubation with cells, and fluorescent counterstaining strongly suggests an endosomal uptake process that requires a critical minimum number or ratio of peptides to be displayed on the QD surface. Concomitant cytoviability testing showed that the QD-peptide conjugates are minimally cytotoxic in the model COS-1 cell line tested. Potential applications of these peptides in the context of cellular delivery of nanoparticles and a variety of other (bio)molecules are discussed.

  5. Local Renyi entropic profiles of DNA sequences

    Directory of Open Access Journals (Sweden)

    Vinga Susana

    2007-10-01

    Full Text Available Abstract Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM. Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  6. The right motifs for plant cell adhesion: what makes an adhesive site?

    Science.gov (United States)

    Langhans, Markus; Weber, Wadim; Babel, Laura; Grunewald, Miriam; Meckel, Tobias

    2017-01-01

    Cells of multicellular organisms are surrounded by and attached to a matrix of fibrous polysaccharides and proteins known as the extracellular matrix. This fibrous network not only serves as a structural support to cells and tissues but also plays an integral part in the process as important as proliferation, differentiation, or defense. While at first sight, the extracellular matrices of plant and animals do not have much in common, a closer look reveals remarkable similarities. In particular, the proteins involved in the adhesion of the cell to the extracellular matrix share many functional properties. At the sequence level, however, a surprising lack of homology is found between adhesion-related proteins of plants and animals. Both protein machineries only reveal similarities between small subdomains and motifs, which further underlines their functional relationship. In this review, we provide an overview on the similarities between motifs in proteins known to be located at the plant cell wall-plasma membrane-cytoskeleton interface to proteins of the animal adhesome. We also show that by comparing the proteome of both adhesion machineries at the level of motifs, we are also able to identify potentially new candidate proteins that functionally contribute to the adhesion of the plant plasma membrane to the cell wall.

  7. Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

    Science.gov (United States)

    Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin

    2017-01-01

    Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment. PMID:28262684

  8. Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

    Science.gov (United States)

    Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin

    2017-03-01

    Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to