WorldWideScience

Sample records for protein sequences motif

  1. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  2. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  3. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction.

    Directory of Open Access Journals (Sweden)

    Aalt D J van Dijk

    Full Text Available Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and

  4. Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs.

    Science.gov (United States)

    Huo, Tong; Liu, Wei; Guo, Yu; Yang, Cheng; Lin, Jianping; Rao, Zihe

    2015-03-26

    Emergence of multiple drug resistant strains of M. tuberculosis (MDR-TB) threatens to derail global efforts aimed at reigning in the pathogen. Co-infections of M. tuberculosis with HIV are difficult to treat. To counter these new challenges, it is essential to study the interactions between M. tuberculosis and the host to learn how these bacteria cause disease. We report a systematic flow to predict the host pathogen interactions (HPIs) between M. tuberculosis and Homo sapiens based on sequence motifs. First, protein sequences were used as initial input for identifying the HPIs by 'interolog' method. HPIs were further filtered by prediction of domain-domain interactions (DDIs). Functional annotations of protein and publicly available experimental results were applied to filter the remaining HPIs. Using such a strategy, 118 pairs of HPIs were identified, which involve 43 proteins from M. tuberculosis and 48 proteins from Homo sapiens. A biological interaction network between M. tuberculosis and Homo sapiens was then constructed using the predicted inter- and intra-species interactions based on the 118 pairs of HPIs. Finally, a web accessible database named PATH (Protein interactions of M. tuberculosis and Human) was constructed to store these predicted interactions and proteins. This interaction network will facilitate the research on host-pathogen protein-protein interactions, and may throw light on how M. tuberculosis interacts with its host.

  5. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    Science.gov (United States)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  6. qPMS7: a fast algorithm for finding (ℓ, d-motifs in DNA and protein sequences.

    Directory of Open Access Journals (Sweden)

    Hieu Dinh

    Full Text Available Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d-motif search (or Planted Motif Search (PMS. A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS, is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.

  7. A structural study for the optimisation of functional motifs encoded in protein sequences

    Directory of Open Access Journals (Sweden)

    Helmer-Citterich Manuela

    2004-04-01

    Full Text Available Abstract Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases, the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of

  8. Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins.

    Directory of Open Access Journals (Sweden)

    David Karlin

    Full Text Available Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa, several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains that could be detected simply by comparing orthologous proteins.

  9. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...

  10. MSDmotif: exploring protein sites and motifs

    Directory of Open Access Journals (Sweden)

    Henrick Kim

    2008-07-01

    Full Text Available Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.

  11. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  12. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    Science.gov (United States)

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of

  13. Structural analysis of a repetitive protein sequence motif in strepsirrhine primate amelogenin.

    Directory of Open Access Journals (Sweden)

    Rodrigo S Lacruz

    2011-03-01

    Full Text Available Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL, the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates.

  14. Short Linear Sequence Motif LxxPTPh Targets Diverse Proteins to Growing Microtubule Ends

    NARCIS (Netherlands)

    Kumar, Anil; Manatschal, Cristina; Rai, Ankit; Grigoriev, Ilya; Degen, Miriam Steiner; Jaussi, Rolf; Kretzschmar, Ines; Prota, Andrea E; Volkmer, Rudolf; Kammerer, Richard A.; Akhmanova, Anna; Steinmetz, Michel O.

    2017-01-01

    Microtubule plus-end tracking proteins (+TIPs) are involved in virtually all microtubule-based processes. End-binding (EB) proteins are considered master regulators of +TIP interaction networks, since they autonomously track growing microtubule ends and recruit a plethora of proteins to this

  15. A tandem sequence motif acts as a distance-dependent enhancer in a set of genes involved in translation by binding the proteins NonO and SFPQ

    Directory of Open Access Journals (Sweden)

    Roepcke Stefan

    2011-12-01

    Full Text Available Abstract Background Bioinformatic analyses of expression control sequences in promoters of co-expressed or functionally related genes enable the discovery of common regulatory sequence motifs that might be involved in co-ordinated gene expression. By studying promoter sequences of the human ribosomal protein genes we recently identified a novel highly specific Localized Tandem Sequence Motif (LTSM. In this work we sought to identify additional genes and LTSM-binding proteins to elucidate potential regulatory mechanisms. Results Genome-wide analyses allowed finding a considerable number of additional LTSM-positive genes, the products of which are involved in translation, among them, translation initiation and elongation factors, and 5S rRNA. Electromobility shift assays then showed specific signals demonstrating the binding of protein complexes to LTSM in ribosomal protein gene promoters. Pull-down assays with LTSM-containing oligonucleotides and subsequent mass spectrometric analysis identified the related multifunctional nucleotide binding proteins NonO and SFPQ in the binding complex. Functional characterization then revealed that LTSM enhances the transcriptional activity of the promoters in dependency of the distance from the transcription start site. Conclusions Our data demonstrate the power of bioinformatic analyses for the identification of biologically relevant sequence motifs. LTSM and the here found LTSM-binding proteins NonO and SFPQ were discovered through a synergistic combination of bioinformatic and biochemical methods and are regulators of the expression of a set of genes of the translational apparatus in a distance-dependent manner.

  16. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  17. Sequence motif upstream of the Hendra virus fusion protein cleavage site is not sufficient to promote efficient proteolytic processing

    International Nuclear Information System (INIS)

    Craft, Willie Warren; Dutch, Rebecca Ellis

    2005-01-01

    The Hendra virus fusion (HeV F) protein is synthesized as a precursor, F 0 , and proteolytically cleaved into the mature F 1 and F 2 heterodimer, following an HDLVDGVK 109 motif. This cleavage event is required for fusogenic activity. To determine the amino acid requirements for processing of the HeV F protein, we constructed multiple mutants. Individual and simultaneous alanine substitutions of the eight residues immediately upstream of the cleavage site did not eliminate processing. A chimeric SV5 F protein in which the furin site was substituted for the VDGVK 109 motif of the HeV F protein was not processed but was expressed on the cell surface. Another chimeric SV5 F protein containing the HDLVDGVK 109 motif of the HeV F protein underwent partial cleavage. These data indicate that the upstream region can play a role in protease recognition, but is neither absolutely required nor sufficient for efficient processing of the HeV F protein

  18. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  19. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  20. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Science.gov (United States)

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.

  1. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore...... advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery....... a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs...

  2. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

    Science.gov (United States)

    Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

    2012-01-01

    To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.

  3. CompariMotif: quick and easy comparisons of sequence motifs.

    Science.gov (United States)

    Edwards, Richard J; Davey, Norman E; Shields, Denis C

    2008-05-15

    CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/

  4. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  5. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  6. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  7. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    Science.gov (United States)

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  8. F-Type Lectins: A Highly Diversified Family of Fucose-Binding Proteins with a Unique Sequence Motif and Structural Fold, Involved in Self/Non-Self-Recognition

    Directory of Open Access Journals (Sweden)

    Gerardo R. Vasta

    2017-11-01

    Full Text Available The F-type lectin (FTL family is one of the most recent to be identified and structurally characterized. Members of the FTL family are characterized by a fucose recognition domain [F-type lectin domain (FTLD] that displays a novel jellyroll fold (“F-type” fold and unique carbohydrate- and calcium-binding sequence motifs. This novel lectin family comprises widely distributed proteins exhibiting single, double, or greater multiples of the FTLD, either tandemly arrayed or combined with other structurally and functionally distinct domains, yielding lectin subunits of pleiotropic properties even within a single species. Furthermore, the extraordinary variability of FTL sequences (isoforms that are expressed in a single individual has revealed genetic mechanisms of diversification in ligand recognition that are unique to FTLs. Functions of FTLs in self/non-self-recognition include innate immunity, fertilization, microbial adhesion, and pathogenesis, among others. In addition, although the F-type fold is distinctive for FTLs, a structure-based search revealed apparently unrelated proteins with minor sequence similarity to FTLs that displayed the FTLD fold. In general, the phylogenetic analysis of FTLD sequences from viruses to mammals reveals clades that are consistent with the currently accepted taxonomy of extant species. However, the surprisingly discontinuous distribution of FTLDs within each taxonomic category suggests not only an extensive structural/functional diversification of the FTLs along evolutionary lineages but also that this intriguing lectin family has been subject to frequent gene duplication, secondary loss, lateral transfer, and functional co-option.

  9. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  10. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

    Science.gov (United States)

    Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

    2015-06-01

    Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Targeting functional motifs of a protein family

    Science.gov (United States)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  12. MotifMark: Finding Regulatory Motifs in DNA Sequences

    OpenAIRE

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L.; Wang, May D.

    2017-01-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity be...

  13. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  14. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  15. Multiple TPR motifs characterize the Fanconi anemia FANCG protein.

    Science.gov (United States)

    Blom, Eric; van de Vrugt, Henri J; de Vries, Yne; de Winter, Johan P; Arwert, Fré; Joenje, Hans

    2004-01-05

    The genome protection pathway that is defective in patients with Fanconi anemia (FA) is controlled by at least eight genes, including BRCA2. A key step in the pathway involves the monoubiquitylation of FANCD2, which critically depends on a multi-subunit nuclear 'core complex' of at least six FANC proteins (FANCA, -C, -E, -F, -G, and -L). Except for FANCL, which has WD40 repeats and a RING finger domain, no significant domain structure has so far been recognized in any of the core complex proteins. By using a homology search strategy comparing the human FANCG protein sequence with its ortholog sequences in Oryzias latipes (Japanese rice fish) and Danio rerio (zebrafish) we identified at least seven tetratricopeptide repeat motifs (TPRs) covering a major part of this protein. TPRs are degenerate 34-amino acid repeat motifs which function as scaffolds mediating protein-protein interactions, often found in multiprotein complexes. In four out of five TPR motifs tested (TPR1, -2, -5, and -6), targeted missense mutagenesis disrupting the motifs at the critical position 8 of each TPR caused complete or partial loss of FANCG function. Loss of function was evident from failure of the mutant proteins to complement the cellular FA phenotype in FA-G lymphoblasts, which was correlated with loss of binding to FANCA. Although the TPR4 mutant fully complemented the cells, it showed a reduced interaction with FANCA, suggesting that this TPR may also be of functional importance. The recognition of FANCG as a typical TPR protein predicts this protein to play a key role in the assembly and/or stabilization of the nuclear FA protein core complex.

  16. Identification of sequence motifs significantly associated with antisense activity

    Directory of Open Access Journals (Sweden)

    Peek Andrew S

    2007-06-01

    Full Text Available Abstract Background Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. Results We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. Conclusion The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic

  17. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks

    Science.gov (United States)

    Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng

    2014-12-01

    Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.

  19. HIV protein sequence hotspots for crosstalk with host hub proteins.

    Directory of Open Access Journals (Sweden)

    Mahdi Sarmady

    Full Text Available HIV proteins target host hub proteins for transient binding interactions. The presence of viral proteins in the infected cell results in out-competition of host proteins in their interaction with hub proteins, drastically affecting cell physiology. Functional genomics and interactome datasets can be used to quantify the sequence hotspots on the HIV proteome mediating interactions with host hub proteins. In this study, we used the HIV and human interactome databases to identify HIV targeted host hub proteins and their host binding partners (H2. We developed a high throughput computational procedure utilizing motif discovery algorithms on sets of protein sequences, including sequences of HIV and H2 proteins. We identified as HIV sequence hotspots those linear motifs that are highly conserved on HIV sequences and at the same time have a statistically enriched presence on the sequences of H2 proteins. The HIV protein motifs discovered in this study are expressed by subsets of H2 host proteins potentially outcompeted by HIV proteins. A large subset of these motifs is involved in cleavage, nuclear localization, phosphorylation, and transcription factor binding events. Many such motifs are clustered on an HIV sequence in the form of hotspots. The sequential positions of these hotspots are consistent with the curated literature on phenotype altering residue mutations, as well as with existing binding site data. The hotspot map produced in this study is the first global portrayal of HIV motifs involved in altering the host protein network at highly connected hub nodes.

  20. Perception Enhancement using Visual Attributes in Sequence Motif Visualization

    OpenAIRE

    Oon, Yin; Lee, Nung; Kok, Wei

    2016-01-01

    Sequence logo is a well-accepted scientific method to visualize the conservation characteristics of biological sequence motifs. Previous studies found that using sequence logo graphical representation for scientific evidence reports or arguments could seriously cause biases and misinterpretation by users. This study investigates on the visual attributes performance of a sequence logo in helping users to perceive and interpret the information based on preattentive theories and Gestalt principl...

  1. Identification of a Baeyer-Villiger monooxygenase sequence motif

    NARCIS (Netherlands)

    Fraaije, MW; Kamerbeek, NM; van Berkel, WJH; Janssen, DB; Kamerbeek, Nanne M.; Berkel, Willem J.H. van

    2002-01-01

    Baeyer-Villiger monooxygenases (BVMOs) form a distinct class of flavoproteins that catalyze the insertion of an oxygen atom in a C-C bond using dioxygen and NAD(P)H. Using newly characterized BVMO sequences, we have uncovered a BVMO-identifying sequence motif: FXGXXXRXXXW(P/D). Studies with

  2. Structural fragment clustering reveals novel structural and functional motifs in α-helical transmembrane proteins

    Directory of Open Access Journals (Sweden)

    Vassilev Boris

    2010-04-01

    Full Text Available Abstract Background A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology. Results We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to α-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94% appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1 a dimer interface motif found in voltage-gated chloride channels, (2 a proton transfer motif found in heme-copper oxidases, and (3 a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b. Conclusions Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

  3. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    OpenAIRE

    Bolton, Michael J; Garry, Robert F

    2011-01-01

    Abstract Background The HIV surface glycoprotein gp120 (SU, gp120) and the Plasmodium vivax Duffy binding protein (PvDBP) bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM). Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infectio...

  4. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  5. High affinity recognition of a Phytophthora protein by Arabidopsis via an RGD motif

    NARCIS (Netherlands)

    Senchou, V.; Weide, R.L.; Carrasco, A.; Bouyssou, H.; Pont-Lezica, R.; Govers, F.; Canut, H.

    2004-01-01

    The RGD tripeptide sequence, a cell adhesion motif present in several extracellular matrix proteins of mammalians, is involved in numerous plant processes. In plant-pathogen interactions, the RGD motif is believed to reduce plant defence responses by disrupting adhesions between the cell wall and

  6. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2011-06-01

    Full Text Available Abstract Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet, which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i ubiquitous motifs, shared by several superfamilies and (ii superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  7. Memetic algorithms for de novo motif-finding in biomedical sequences.

    Science.gov (United States)

    Bi, Chengpeng

    2012-09-01

    The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary micro

  8. Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

    Science.gov (United States)

    Kinjo, Akira R.; Nakamura, Haruki

    2012-01-01

    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478

  9. Anion induced conformational preference of Cα NN motif residues in functional proteins.

    Science.gov (United States)

    Patra, Piya; Ghosh, Mahua; Banerjee, Raja; Chakrabarti, Jaydeb

    2017-12-01

    Among different ligand binding motifs, anion binding C α NN motif consisting of peptide backbone atoms of three consecutive residues are observed to be important for recognition of free anions, like sulphate or biphosphate and participate in different key functions. Here we study the interaction of sulphate and biphosphate with C α NN motif present in different proteins. Instead of total protein, a peptide fragment has been studied keeping C α NN motif flanked in between other residues. We use classical force field based molecular dynamics simulations to understand the stability of this motif. Our data indicate fluctuations in conformational preferences of the motif residues in absence of the anion. The anion gives stability to one of these conformations. However, the anion induced conformational preferences are highly sequence dependent and specific to the type of anion. In particular, the polar residues are more favourable compared to the other residues for recognising the anion. © 2017 Wiley Periodicals, Inc.

  10. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  11. Protein Chaperones Q8ZP25_SALTY from Salmonella Typhimurium and HYAE_ECOLI from Escherichia coli Exhibit Thioredoxin-like Structures Despite Lack of Canonical Thioredoxin Active Site Sequence Motif

    Energy Technology Data Exchange (ETDEWEB)

    Parish, D.; Benach, J; Liu, G; Singarapu, K; Xiao, R; Acton, T; Hunt, J; Montelione, G; Szyperski, T; et. al.

    2008-01-01

    The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe) hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  12. Protein chaperones Q8ZP25_SALTY from Salmonella typhimurium and HYAE_ECOLI from Escherichia coli exhibit thioredoxin-like structures despite lack of canonical thioredoxin active site sequence motif.

    Science.gov (United States)

    Parish, David; Benach, Jordi; Liu, Goahua; Singarapu, Kiran Kumar; Xiao, Rong; Acton, Thomas; Su, Min; Bansal, Sonal; Prestegard, James H; Hunt, John; Montelione, Gaetano T; Szyperski, Thomas

    2008-12-01

    The structure of the 142-residue protein Q8ZP25_SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE_ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE_ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE_ECOLI was previously classified as a [NiFe] hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  13. Regulation of amyloid precursor protein processing by its KFERQ motif.

    Science.gov (United States)

    Park, Ji-Seon; Kim, Dong-Hou; Yoon, Seung-Yong

    2016-06-01

    Understanding of trafficking, processing, and degradation mechanisms of amyloid precursor protein (APP) is important because APP can be processed to produce β-amyloid (Aβ), a key pathogenic molecule in Alzheimer's disease (AD). Here, we found that APP contains KFERQ motif at its C-terminus, a consensus sequence for chaperone-mediated autophagy (CMA) or microautophagy which are another types of autophagy for degradation of pathogenic molecules in neurodegenerative diseases. Deletion of KFERQ in APP increased C-terminal fragments (CTFs) and secreted N-terminal fragments of APP and kept it away from lysosomes. KFERQ deletion did not abolish the interaction of APP or its cleaved products with heat shock cognate protein 70 (Hsc70), a protein necessary for CMA or microautophagy. These findings suggest that KFERQ motif is important for normal processing and degradation of APP to preclude the accumulation of APP-CTFs although it may not be important for CMA or microautophagy. [BMB Reports 2016; 49(6): 337-342].

  14. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Science.gov (United States)

    Fauteux, François; Strömvik, Martina V

    2009-01-01

    Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

  15. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Directory of Open Access Journals (Sweden)

    Fauteux François

    2009-10-01

    Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination

  16. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified...

  17. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    Directory of Open Access Journals (Sweden)

    Bolton Michael J

    2011-11-01

    Full Text Available Abstract Background The HIV surface glycoprotein gp120 (SU, gp120 and the Plasmodium vivax Duffy binding protein (PvDBP bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM. Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infection of erythrocytes and DBP binding to the Duffy Antigen Receptor for Chemokines (DARC. A peptide including the HBM of PvDBP had similar affinity for heparin as RANTES and V3 loop peptides, and could be specifically inhibited from heparin binding by the same polyanions that inhibit DBP binding to DARC. However, some V3 peptides can competitively inhibit RANTES binding to heparin, but not the PvDBP HBM peptide. Three other members of the DBP family have an HBM sequence that is necessary for erythrocyte binding, however only the protein which binds to DARC, the P. knowlesi alpha protein, is inhibited by heparin from binding to erythrocytes. Heparitinase digestion does not affect the binding of DBP to erythrocytes. Conclusion The HBMs of DBPs that bind to DARC have similar heparin binding affinities as some V3 loop peptides and chemokines, are responsible for specific sulfated polysaccharide inhibition of parasite binding and invasion of red blood cells, and are more likely to bind to negative charges on the receptor than cell surface glycosaminoglycans.

  18. Conserved binding of GCAC motifs by MEC-8, couch potato, and the RBPMS protein family

    Science.gov (United States)

    Soufari, Heddy

    2017-01-01

    Precise regulation of mRNA processing, translation, localization, and stability relies on specific interactions with RNA-binding proteins whose biological function and target preference are dictated by their preferred RNA motifs. The RBPMS family of RNA-binding proteins is defined by a conserved RNA recognition motif (RRM) domain found in metazoan RBPMS/Hermes and RBPMS2, Drosophila couch potato, and MEC-8 from Caenorhabditis elegans. In order to determine the parameters of RNA sequence recognition by the RBPMS family, we have first used the N-terminal domain from MEC-8 in binding assays and have demonstrated a preference for two GCAC motifs optimally separated by >6 nucleotides (nt). We have also determined the crystal structure of the dimeric N-terminal RRM domain from MEC-8 in the unbound form, and in complex with an oligonucleotide harboring two copies of the optimal GCAC motif. The atomic details reveal the molecular network that provides specificity to all four bases in the motif, including multiple hydrogen bonds to the initial guanine. Further studies with human RBPMS, as well as Drosophila couch potato, confirm a general preference for this double GCAC motif by other members of the protein family and the presence of this motif in known targets. PMID:28003515

  19. LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

    Science.gov (United States)

    Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

    2014-02-17

    As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of

  20. Short Arginine Motifs Drive Protein Stickiness in the Escherichia coli Cytoplasm.

    Science.gov (United States)

    Kyne, Ciara; Crowley, Peter B

    2017-09-19

    Although essential to numerous biotech applications, knowledge of molecular recognition by arginine-rich motifs in live cells remains limited. 1 H, 15 N HSQC and 19 F NMR spectroscopies were used to investigate the effects of C-terminal -GR n (n = 1-5) motifs on GB1 interactions in Escherichia coli cells and cell extracts. While the "biologically inert" GB1 yields high-quality in-cell spectra, the -GR n fusions with n = 4 or 5 were undetectable. This result suggests that a tetra-arginine motif is sufficient to drive interactions between a test protein and macromolecules in the E. coli cytoplasm. The inclusion of a 12 residue flexible linker between GB1 and the -GR 5 motif did not improve detection of the "inert" domain. In contrast, all of the constructs were detectable in cell lysates and extracts, suggesting that the arginine-mediated complexes were weak. Together these data reveal the significance of weak interactions between short arginine-rich motifs and the E. coli cytoplasm and demonstrate the potential of such motifs to modify protein interactions in living cells. These interactions must be considered in the design of (in vivo) nanoscale assemblies that rely on arginine-rich sequences.

  1. Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

    Science.gov (United States)

    Kinjo, Akira R; Nakamura, Haruki

    2013-01-01

    Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.

  2. Overlapping ETS and CRE Motifs (G/CCGGAAGTGACGTCA) Preferentially Bound by GABPα and CREB Proteins

    Science.gov (United States)

    Chatterjee, Raghunath; Zhao, Jianfei; He, Ximiao; Shlyakhtenko, Andrey; Mann, Ishminder; Waterfall, Joshua J.; Meltzer, Paul; Sathyanarayana, B. K.; FitzGerald, Peter C.; Vinson, Charles

    2012-01-01

    Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X4-N1-30-X4) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif (C/GCCGGAAGCGGAA) and the ETS⇔CRE motif (C/GCGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif. PMID:23050235

  3. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

    Science.gov (United States)

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-07-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.

  4. TOPDOM: database of conservatively located domains and motifs in proteins.

    Science.gov (United States)

    Varga, Julia; Dobson, László; Tusnády, Gábor E

    2016-09-01

    The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  5. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  6. Repeat Sequence Proteins as Matrices for Nanocomposites

    Energy Technology Data Exchange (ETDEWEB)

    Drummy, L.; Koerner, H; Phillips, D; McAuliffe, J; Kumar, M; Farmer, B; Vaia, R; Naik, R

    2009-01-01

    Recombinant protein-inorganic nanocomposites comprised of exfoliated Na+ montmorillonite (MMT) in a recombinant protein matrix based on silk-like and elastin-like amino acid motifs (silk elastin-like protein (SELP)) were formed via a solution blending process. Charged residues along the protein backbone are shown to dominate long-range interactions, whereas the SELP repeat sequence leads to local protein/MMT compatibility. Up to a 50% increase in room temperature modulus and a comparable decrease in high temperature coefficient of thermal expansion occur for cast films containing 2-10 wt.% MMT.

  7. Identification of group specific motifs in Beta-lactamase family of proteins

    Directory of Open Access Journals (Sweden)

    Saxena Akansha

    2009-12-01

    Full Text Available Abstract Background Beta-lactamases are one of the most serious threats to public health. In order to combat this threat we need to study the molecular and functional diversity of these enzymes and identify signatures specific to these enzymes. These signatures will enable us to develop inhibitors and diagnostic probes specific to lactamases. The existing classification of beta-lactamases was developed nearly 30 years ago when few lactamases were available. DLact database contain more than 2000 beta-lactamase, which can be used to study the molecular diversity and to identify signatures specific to this family. Methods A set of 2020 beta-lactamase proteins available in the DLact database http://59.160.102.202/DLact were classified using graph-based clustering of Best Bi-Directional Hits. Non-redundant (> 90 percent identical protein sequences from each group were aligned using T-Coffee and annotated using information available in literature. Motifs specific to each group were predicted using PRATT program. Results The graph-based classification of beta-lactamase proteins resulted in the formation of six groups (Four major groups containing 191, 726, 774 and 73 proteins while two minor groups containing 50 and 8 proteins. Based on the information available in literature, we found that each of the four major groups correspond to the four classes proposed by Ambler. The two minor groups were novel and do not contain molecular signatures of beta-lactamase proteins reported in literature. The group-specific motifs showed high sensitivity (> 70% and very high specificity (> 90%. The motifs from three groups (corresponding to class A, C and D had a high level of conservation at DNA as well as protein level whereas the motifs from the fourth group (corresponding to class B showed conservation at only protein level. Conclusion The graph-based classification of beta-lactamase proteins corresponds with the classification proposed by Ambler, thus there is

  8. Rtt107/Esc4 binds silent chromatin and DNA repair proteins using different BRCT motifs

    Directory of Open Access Journals (Sweden)

    Jockusch Rebecca A

    2006-11-01

    Full Text Available Abstract Background By screening a plasmid library for proteins that could cause silencing when targeted to the HMR locus in Saccharomyces cerevisiae, we previously reported the identification of Rtt107/Esc4 based on its ability to establish silent chromatin. In this study we aimed to determine the mechanism of Rtt107/Esc4 targeted silencing and also learn more about its biological functions. Results Targeted silencing by Rtt107/Esc4 was dependent on the SIR genes, which encode obligatory structural and enzymatic components of yeast silent chromatin. Based on its sequence, Rtt107/Esc4 was predicted to contain six BRCT motifs. This motif, originally identified in the human breast tumor suppressor gene BRCA1, is a protein interaction domain. The targeted silencing activity of Rtt107/Esc4 resided within the C-terminal two BRCT motifs, and this region of the protein bound to Sir3 in two-hybrid tests. Deletion of RTT107/ESC4 caused sensitivity to the DNA damaging agent MMS as well as to hydroxyurea. A two-hybrid screen showed that the N-terminal BRCT motifs of Rtt107/Esc4 bound to Slx4, a protein previously shown to be involved in DNA repair and required for viability in a strain lacking the DNA helicase Sgs1. Like SLX genes, RTT107ESC4 interacted genetically with SGS1; esc4Δ sgs1Δ mutants were viable, but exhibited a slow-growth phenotype and also a synergistic DNA repair defect. Conclusion Rtt107/Esc4 binds to the silencing protein Sir3 and the DNA repair protein Slx4 via different BRCT motifs, thus providing a bridge linking silent chromatin to DNA repair enzymes.

  9. A Conserved Metal Binding Motif in the Bacillus subtilis Competence Protein ComFA Enhances Transformation.

    Science.gov (United States)

    Chilton, Scott S; Falbel, Tanya G; Hromada, Susan; Burton, Briana M

    2017-08-01

    Genetic competence is a process in which cells are able to take up DNA from their environment, resulting in horizontal gene transfer, a major mechanism for generating diversity in bacteria. Many bacteria carry homologs of the central DNA uptake machinery that has been well characterized in Bacillus subtilis It has been postulated that the B. subtilis competence helicase ComFA belongs to the DEAD box family of helicases/translocases. Here, we made a series of mutants to analyze conserved amino acid motifs in several regions of B. subtilis ComFA. First, we confirmed that ComFA activity requires amino acid residues conserved among the DEAD box helicases, and second, we show that a zinc finger-like motif consisting of four cysteines is required for efficient transformation. Each cysteine in the motif is important, and mutation of at least two of the cysteines dramatically reduces transformation efficiency. Further, combining multiple cysteine mutations with the helicase mutations shows an additive phenotype. Our results suggest that the helicase and metal binding functions are two distinct activities important for ComFA function during transformation. IMPORTANCE ComFA is a highly conserved protein that has a role in DNA uptake during natural competence, a mechanism for horizontal gene transfer observed in many bacteria. Investigation of the details of the DNA uptake mechanism is important for understanding the ways in which bacteria gain new traits from their environment, such as drug resistance. To dissect the role of ComFA in the DNA uptake machinery, we introduced point mutations into several motifs in the protein sequence. We demonstrate that several amino acid motifs conserved among ComFA proteins are important for efficient transformation. This report is the first to demonstrate the functional requirement of an amino-terminal cysteine motif in ComFA. Copyright © 2017 American Society for Microbiology.

  10. Defining a conformational consensus motif in cotransin-sensitive signal sequences: a proteomic and site-directed mutagenesis study.

    Directory of Open Access Journals (Sweden)

    Wolfgang Klein

    Full Text Available The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity.

  11. Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

    Science.gov (United States)

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945

  12. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  13. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  14. An essential GT motif in the lamin A promoter mediates activation by CREB-binding protein

    International Nuclear Information System (INIS)

    Janaki Ramaiah, M.; Parnaik, Veena K.

    2006-01-01

    Lamin A is an important component of nuclear architecture in mammalian cells. Mutations in the human lamin A gene lead to highly degenerative disorders that affect specific tissues. In studies directed towards understanding the mode of regulation of the lamin A promoter, we have identified an essential GT motif at -55 position by reporter gene assays and mutational analysis. Binding of this sequence to Sp transcription factors has been observed in electrophoretic mobility shift assays and by chromatin immunoprecipitation studies. Further functional analysis by co-expression of recombinant proteins and ChIP assays has shown an important regulatory role for CREB-binding protein in promoter activation, which is mediated by the GT motif

  15. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin; Wong, Hau-San

    2015-01-01

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  16. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray

    KAUST Repository

    Wong, Ka-Chun

    2015-06-11

    Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  17. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  18. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    Science.gov (United States)

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  19. SiteBinder: an improved approach for comparing multiple protein structural motifs.

    Science.gov (United States)

    Sehnal, David; Vařeková, Radka Svobodová; Huber, Heinrich J; Geidl, Stanislav; Ionescu, Crina-Maria; Wimmerová, Michaela; Koča, Jaroslav

    2012-02-27

    There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.

  20. Distinct configurations of protein complexes and biochemical pathways revealed by epistatic interaction network motifs

    LENUS (Irish Health Repository)

    Casey, Fergal

    2011-08-22

    Abstract Background Gene and protein interactions are commonly represented as networks, with the genes or proteins comprising the nodes and the relationship between them as edges. Motifs, or small local configurations of edges and nodes that arise repeatedly, can be used to simplify the interpretation of networks. Results We examined triplet motifs in a network of quantitative epistatic genetic relationships, and found a non-random distribution of particular motif classes. Individual motif classes were found to be associated with different functional properties, suggestive of an underlying biological significance. These associations were apparent not only for motif classes, but for individual positions within the motifs. As expected, NNN (all negative) motifs were strongly associated with previously reported genetic (i.e. synthetic lethal) interactions, while PPP (all positive) motifs were associated with protein complexes. The two other motif classes (NNP: a positive interaction spanned by two negative interactions, and NPP: a negative spanned by two positives) showed very distinct functional associations, with physical interactions dominating for the former but alternative enrichments, typical of biochemical pathways, dominating for the latter. Conclusion We present a model showing how NNP motifs can be used to recognize supportive relationships between protein complexes, while NPP motifs often identify opposing or regulatory behaviour between a gene and an associated pathway. The ability to use motifs to point toward underlying biological organizational themes is likely to be increasingly important as more extensive epistasis mapping projects in higher organisms begin.

  1. Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.

    Science.gov (United States)

    Wang, Ying; Ding, Jun; Daniell, Henry; Hu, Haiyan; Li, Xiaoman

    2012-09-01

    Chloroplasts play critical roles in land plant cells. Despite their importance and the availability of at least 200 sequenced chloroplast genomes, the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper, we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes, indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses, protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs. Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein, which sheds light on the understanding of the transcriptional regulation of chloroplast genes.

  2. Distance-dependent duplex DNA destabilization proximal to G-quadruplex/i-motif sequences

    Science.gov (United States)

    König, Sebastian L. B.; Huppert, Julian L.; Sigel, Roland K. O.; Evans, Amanda C.

    2013-01-01

    G-quadruplexes and i-motifs are complementary examples of non-canonical nucleic acid substructure conformations. G-quadruplex thermodynamic stability has been extensively studied for a variety of base sequences, but the degree of duplex destabilization that adjacent quadruplex structure formation can cause has yet to be fully addressed. Stable in vivo formation of these alternative nucleic acid structures is likely to be highly dependent on whether sufficient spacing exists between neighbouring duplex- and quadruplex-/i-motif-forming regions to accommodate quadruplexes or i-motifs without disrupting duplex stability. Prediction of putative G-quadruplex-forming regions is likely to be assisted by further understanding of what distance (number of base pairs) is required for duplexes to remain stable as quadruplexes or i-motifs form. Using oligonucleotide constructs derived from precedented G-quadruplexes and i-motif-forming bcl-2 P1 promoter region, initial biophysical stability studies indicate that the formation of G-quadruplex and i-motif conformations do destabilize proximal duplex regions. The undermining effect that quadruplex formation can have on duplex stability is mitigated with increased distance from the duplex region: a spacing of five base pairs or more is sufficient to maintain duplex stability proximal to predicted quadruplex/i-motif-forming regions. PMID:23771141

  3. Regulation and function of the CD3¿ DxxxLL motif: a binding site for adaptor protein-1 and adaptor protein-2 in vitro

    DEFF Research Database (Denmark)

    Dietrich, J; Kastrup, J; Nielsen, B L

    1997-01-01

    /CD3gamma chimeras; and in vitro by binding CD3gamma peptides to clathrin-coated vesicle adaptor proteins (APs). We find that the CD3gamma D127xxxLL131/132 sequence represents one united motif for binding of both AP-1 and AP-2, and that this motif functions as an active sorting motif in monomeric CD4...... and for AP binding in vitro. Furthermore, we provide evidence indicating that phosphorylation of CD3gamma S126 in the context of the complete TCR induces a conformational change that exposes the DxxxLL sequence for AP binding. Exposure of the DxxxLL motif causes an increase in the TCR internalization rate...

  4. Discovering sequence motifs in quantitative and qualitative pepetide data

    DEFF Research Database (Denmark)

    Andreatta, Massimo

    online as a web-server, was applied to various data sets including mixtures of MHC binding data and distinct classes of ligands to SH3 domains. Next, we investigated how string kernels could be used to identify pattern in peptide data, with particular focus on the MHC class I system. We suggest......Proteins are central to virtually all processes within the cell. The vast amount of functions performed by proteins in biological processes is conferred by their ability to bind in a selective and specific manner to other molecules. The nature of these interactions is, in general terms, three......-dimensional, as binding sites normally consist of a pocket or a groove on the protein surface. However, in many cases such interactions contain a linear component and can be more conveniently represented, or approximated, by a protein-peptide interaction. Whereas time-consuming structural studies are necessary in systems...

  5. PDL1 Signals through Conserved Sequence Motifs to Overcome Interferon-Mediated Cytotoxicity

    Directory of Open Access Journals (Sweden)

    Maria Gato-Cañas

    2017-08-01

    Full Text Available PDL1 blockade produces remarkable clinical responses, thought to occur by T cell reactivation through prevention of PDL1-PD1 T cell inhibitory interactions. Here, we find that PDL1 cell-intrinsic signaling protects cancer cells from interferon (IFN cytotoxicity and accelerates tumor progression. PDL1 inhibited IFN signal transduction through a conserved class of sequence motifs that mediate crosstalk with IFN signaling. Abrogation of PDL1 expression or antibody-mediated PDL1 blockade strongly sensitized cancer cells to IFN cytotoxicity through a STAT3/caspase-7-dependent pathway. Moreover, somatic mutations found in human carcinomas within these PDL1 sequence motifs disrupted motif regulation, resulting in PDL1 molecules with enhanced protective activities from type I and type II IFN cytotoxicity. Overall, our results reveal a mode of action of PDL1 in cancer cells as a first line of defense against IFN cytotoxicity.

  6. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  7. Viroids: from genotype to phenotype just relying on RNA sequence and structural motifs

    Directory of Open Access Journals (Sweden)

    Ricardo eFlores

    2012-06-01

    Full Text Available As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson-Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunvioidae adopt multibranched conformations occasionally stabilized by kissing loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunvioidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures ⎯either global or local ⎯ determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs.

  8. Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

    Directory of Open Access Journals (Sweden)

    William R. Gallaher

    2015-01-01

    Full Text Available Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP and the full length glycoprotein (GP, which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4 of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis.

  9. Positive evolutionary selection of an HD motif on Alzheimer precursor protein orthologues suggests a functional role.

    Science.gov (United States)

    Miklós, István; Zádori, Zoltán

    2012-02-01

    HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (pHD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the "transcription binding site turnover." CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs.

  10. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  11. The RXL motif of the African cassava mosaic virus Rep protein is necessary for rereplication of yeast DNA and viral infection in plants

    Energy Technology Data Exchange (ETDEWEB)

    Hipp, Katharina; Rau, Peter; Schäfer, Benjamin [Institut für Biomaterialien und biomolekulare Systeme, Abteilung für Molekularbiologie und Virologie der Pflanzen, Universität Stuttgart, Pfaffenwaldring 57, D-70550 Stuttgart (Germany); Gronenborn, Bruno [Institut des Sciences du Végétal, CNRS, 91198 Gif-sur-Yvette (France); Jeske, Holger, E-mail: holger.jeske@bio.uni-stuttgart.de [Institut für Biomaterialien und biomolekulare Systeme, Abteilung für Molekularbiologie und Virologie der Pflanzen, Universität Stuttgart, Pfaffenwaldring 57, D-70550 Stuttgart (Germany)

    2014-08-15

    Geminiviruses, single-stranded DNA plant viruses, encode a replication-initiator protein (Rep) that is indispensable for virus replication. A potential cyclin interaction motif (RXL) in the sequence of African cassava mosaic virus Rep may be an alternative link to cell cycle controls to the known interaction with plant homologs of retinoblastoma protein (pRBR). Mutation of this motif abrogated rereplication in fission yeast induced by expression of wildtype Rep suggesting that Rep interacts via its RXL motif with one or several yeast proteins. The RXL motif is essential for viral infection of Nicotiana benthamiana plants, since mutation of this motif in infectious clones prevented any symptomatic infection. The cell-cycle link (Clink) protein of a nanovirus (faba bean necrotic yellows virus) was investigated that activates the cell cycle by binding via its LXCXE motif to pRBR. Expression of wildtype Clink and a Clink mutant deficient in pRBR-binding did not trigger rereplication in fission yeast. - Highlights: • A potential cyclin interaction motif is conserved in geminivirus Rep proteins. • In ACMV Rep, this motif (RXL) is essential for rereplication of fission yeast DNA. • Mutating RXL abrogated viral infection completely in Nicotiana benthamiana. • Expression of a nanovirus Clink protein in yeast did not induce rereplication. • Plant viruses may have evolved multiple routes to exploit host DNA synthesis.

  12. The RXL motif of the African cassava mosaic virus Rep protein is necessary for rereplication of yeast DNA and viral infection in plants

    International Nuclear Information System (INIS)

    Hipp, Katharina; Rau, Peter; Schäfer, Benjamin; Gronenborn, Bruno; Jeske, Holger

    2014-01-01

    Geminiviruses, single-stranded DNA plant viruses, encode a replication-initiator protein (Rep) that is indispensable for virus replication. A potential cyclin interaction motif (RXL) in the sequence of African cassava mosaic virus Rep may be an alternative link to cell cycle controls to the known interaction with plant homologs of retinoblastoma protein (pRBR). Mutation of this motif abrogated rereplication in fission yeast induced by expression of wildtype Rep suggesting that Rep interacts via its RXL motif with one or several yeast proteins. The RXL motif is essential for viral infection of Nicotiana benthamiana plants, since mutation of this motif in infectious clones prevented any symptomatic infection. The cell-cycle link (Clink) protein of a nanovirus (faba bean necrotic yellows virus) was investigated that activates the cell cycle by binding via its LXCXE motif to pRBR. Expression of wildtype Clink and a Clink mutant deficient in pRBR-binding did not trigger rereplication in fission yeast. - Highlights: • A potential cyclin interaction motif is conserved in geminivirus Rep proteins. • In ACMV Rep, this motif (RXL) is essential for rereplication of fission yeast DNA. • Mutating RXL abrogated viral infection completely in Nicotiana benthamiana. • Expression of a nanovirus Clink protein in yeast did not induce rereplication. • Plant viruses may have evolved multiple routes to exploit host DNA synthesis

  13. Gene Isolation Using Degenerate Primers Targeting Protein Motif: A Laboratory Exercise

    Science.gov (United States)

    Yeo, Brandon Pei Hui; Foong, Lian Chee; Tam, Sheh May; Lee, Vivian; Hwang, Siaw San

    2018-01-01

    Structures and functions of protein motifs are widely included in many biology-based course syllabi. However, little emphasis is placed to link this knowledge to applications in biotechnology to enhance the learning experience. Here, the conserved motifs of nucleotide binding site-leucine rich repeats (NBS-LRR) proteins, successfully used for the…

  14. Determination of 5 '-leader sequences from radically disparate strains of porcine reproductive and respiratory syndrome virus reveals the presence of highly conserved sequence motifs

    DEFF Research Database (Denmark)

    Oleksiewicz, M.B.; Bøtner, Anette; Nielsen, Jens

    1999-01-01

    We determined the untranslated 5'-leader sequence for three different isolates of porcine reproductive and respiratory syndrome virus (PRRSV): pathogenic European- and American-types, as well as an American-type vaccine strain. 5'-leader from European- and American-type PRRSV differed in length...... (220 and 190 nt, respectively), and exhibited only approximately 50% nucleotide homology. Nevertheless, highly conserved areas were identified in the leader of all 3 PRRSV isolates, which constitute candidate motifs for binding of protein(s) involved in viral replication. These comparative data provide...

  15. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo; Jankovic, Boris R.; Bajic, Vladimir B.; Song, Le; Gao, Xin

    2013-01-01

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  16. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  17. RNA recognition motif (RRM)-containing proteins in Bombyx mori

    African Journals Online (AJOL)

    STORAGESEVER

    2009-03-20

    Mar 20, 2009 ... Recognition Motif (RRM), sometimes referred to as. RNP1, is one of the first identified domains for RNA interaction. RRM is very common ..... Apart from the RRM motif, eIF3-S9 has a Trp-Asp. (WD) repeat domain, Poly (A) ...

  18. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

    Science.gov (United States)

    Zhao, Xiaoyan; Sze, Sing-Hoi

    2011-05-01

    One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.

  20. I-Ad-binding peptides derived from unrelated protein antigens share a common structural motif

    DEFF Research Database (Denmark)

    Sette, A; Buus, S; Colon, S

    1988-01-01

    on the I-Ad binding of the immunogenic peptide OVA 323-339. The results obtained demonstrated the very permissive nature of Ag-Ia interaction. We also showed that unrelated peptides that are good I-Ad binders share a common structural motif and speculated that recognition of such motifs could represent...... that I-Ad molecules recognize a large library of Ag by virtue of common structural motifs present in peptides derived from phylogenetically unrelated proteins....

  1. Systematic comparison of the response properties of protein and RNA mediated gene regulatory motifs.

    Science.gov (United States)

    Iyengar, Bharat Ravi; Pillai, Beena; Venkatesh, K V; Gadgil, Chetan J

    2017-05-30

    We present a framework enabling the dissection of the effects of motif structure (feedback or feedforward), the nature of the controller (RNA or protein), and the regulation mode (transcriptional, post-transcriptional or translational) on the response to a step change in the input. We have used a common model framework for gene expression where both motif structures have an activating input and repressing regulator, with the same set of parameters, to enable a comparison of the responses. We studied the global sensitivity of the system properties, such as steady-state gain, overshoot, peak time, and peak duration, to parameters. We find that, in all motifs, overshoot correlated negatively whereas peak duration varied concavely with peak time. Differences in the other system properties were found to be mainly dependent on the nature of the controller rather than the motif structure. Protein mediated motifs showed a higher degree of adaptation i.e. a tendency to return to baseline levels; in particular, feedforward motifs exhibited perfect adaptation. RNA mediated motifs had a mild regulatory effect; they also exhibited a lower peaking tendency and mean overshoot. Protein mediated feedforward motifs showed higher overshoot and lower peak time compared to the corresponding feedback motifs.

  2. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Directory of Open Access Journals (Sweden)

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  3. The primary structure of L37--a rat ribosomal protein with a zinc finger-like motif.

    Science.gov (United States)

    Chan, Y L; Paz, V; Olvera, J; Wool, I G

    1993-04-30

    The amino acid sequence of the rat 60S ribosomal subunit protein L37 was deduced from the sequence of nucleotides in a recombinant cDNA. Ribosomal protein L37 has 96 amino acids, the NH2-terminal methionine is removed after translation of the mRNA, and has a molecular weight of 10,939. Ribosomal protein L37 has a single zinc finger-like motif of the C2-C2 type. Hybridization of the cDNA to digests of nuclear DNA suggests that there are 13 or 14 copies of the L37 gene. The mRNA for the protein is about 500 nucleotides in length. Rat L37 is related to Saccharomyces cerevisiae ribosomal protein YL35 and to Caenorhabditis elegans L37. We have identified in the data base a DNA sequence that encodes the chicken homolog of rat L37.

  4. Efficient farnesylation of an extended C-terminal C(x)3X sequence motif expands the scope of the prenylated proteome.

    Science.gov (United States)

    Blanden, Melanie J; Suazo, Kiall F; Hildebrandt, Emily R; Hardgrove, Daniel S; Patel, Meet; Saunders, William P; Distefano, Mark D; Schmidt, Walter K; Hougland, James L

    2018-02-23

    Protein prenylation is a post-translational modification that has been most commonly associated with enabling protein trafficking to and interaction with cellular membranes. In this process, an isoprenoid group is attached to a cysteine near the C terminus of a substrate protein by protein farnesyltransferase (FTase) or protein geranylgeranyltransferase type I or II (GGTase-I and GGTase-II). FTase and GGTase-I have long been proposed to specifically recognize a four-amino acid C AAX C-terminal sequence within their substrates. Surprisingly, genetic screening reveals that yeast FTase can modify sequences longer than the canonical C AAX sequence, specifically C( x ) 3 X sequences with four amino acids downstream of the cysteine. Biochemical and cell-based studies using both peptide and protein substrates reveal that mammalian FTase orthologs can also prenylate C( x ) 3 X sequences. As the search to identify physiologically relevant C( x ) 3 X proteins begins, this new prenylation motif nearly doubles the number of proteins within the yeast and human proteomes that can be explored as potential FTase substrates. This work expands our understanding of prenylation's impact within the proteome, establishes the biologically relevant reactivity possible with this new motif, and opens new frontiers in determining the impact of non-canonically prenylated proteins on cell function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. A Novel Protein Interaction between Nucleotide Binding Domain of Hsp70 and p53 Motif

    Directory of Open Access Journals (Sweden)

    Asita Elengoe

    2015-01-01

    Full Text Available Currently, protein interaction of Homo sapiens nucleotide binding domain (NBD of heat shock 70 kDa protein (PDB: 1HJO with p53 motif remains to be elucidated. The NBD-p53 motif complex enhances the p53 stabilization, thereby increasing the tumor suppression activity in cancer treatment. Therefore, we identified the interaction between NBD and p53 using STRING version 9.1 program. Then, we modeled the three-dimensional structure of p53 motif through homology modeling and determined the binding affinity and stability of NBD-p53 motif complex structure via molecular docking and dynamics (MD simulation. Human DNA binding domain of p53 motif (SCMGGMNR retrieved from UniProt (UniProtKB: P04637 was docked with the NBD protein, using the Autodock version 4.2 program. The binding energy and intermolecular energy for the NBD-p53 motif complex were −0.44 Kcal/mol and −9.90 Kcal/mol, respectively. Moreover, RMSD, RMSF, hydrogen bonds, salt bridge, and secondary structure analyses revealed that the NBD protein had a strong bond with p53 motif and the protein-ligand complex was stable. Thus, the current data would be highly encouraging for designing Hsp70 structure based drug in cancer therapy.

  6. Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs.

    Directory of Open Access Journals (Sweden)

    Michael Allevato

    Full Text Available The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX bind Enhancer box (E-box DNA elements (CANNTG and have the greatest affinity for the canonical MYC E-box (CME CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a "non-specific" fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87% of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought.

  7. Peptomics, identification of novel cationic Arabidopsis peptides with conserved sequence motifs

    DEFF Research Database (Denmark)

    Olsen, Addie Nina; Mundy, John; Skriver, Karen

    2002-01-01

    Arabidopsis family of 34 genes. The predicted peptides are characterized by a conserved C-terminal sequence motif and additional primary structure conservation in a core region. The majority of these genes had not previously been annotated. A subset of the predicted peptides show high overall sequence...... similarity to Rapid Alkalinization Factor (RALF), a peptide isolated from tobacco. We therefore refer to this peptide family as RALFL for RALF-Like. RT-PCR analysis confirmed that several of the Arabidopsis genes are expressed and that their expression patterns vary. The identification of a large gene family...

  8. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  9. Requirement for asparagine in the aquaporin NPA sequence signature motifs for cation exclusion

    DEFF Research Database (Denmark)

    Wree, Dorothea; Wu, Binghua; Zeuthen, Thomas

    2011-01-01

    Two highly conserved NPA motifs are a hallmark of the aquaporin (AQP) family. The NPA triplets form N-terminal helix capping structures with the Asn side chains located in the centre of the water or solute-conducting channel, and are considered to play an important role in AQP selectivity. Although...... interchangeable at both NPA sites without affecting protein expression or water, glycerol and methylamine permeability. However, other mutations in the NPA region led to reduced permeability (S186C and S186D), to nonfunctional channels (N64D), or even to lack of protein expression (S186A and S186T). Using...... electrophysiology, we found that an analogous mammalian AQP1 N76S mutant excluded protons and potassium ions, but leaked sodium ions, providing an argument for the overwhelming prevalence of Asn over other amino acids. We conclude that, at the first position in the NPA motifs, only Asn provides efficient helix cap...

  10. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element.

    Science.gov (United States)

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-07-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5'-NNCCAC-3' and 5'-GCGMGN'N'-3' (M:A or C; N and N' form Watson-Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences.

  11. Fragment-based modelling of single stranded RNA bound to RNA recognition motif containing proteins

    Science.gov (United States)

    de Beauchene, Isaure Chauvot; de Vries, Sjoerd J.; Zacharias, Martin

    2016-01-01

    Abstract Protein-RNA complexes are important for many biological processes. However, structural modeling of such complexes is hampered by the high flexibility of RNA. Particularly challenging is the docking of single-stranded RNA (ssRNA). We have developed a fragment-based approach to model the structure of ssRNA bound to a protein, based on only the protein structure, the RNA sequence and conserved contacts. The conformational diversity of each RNA fragment is sampled by an exhaustive library of trinucleotides extracted from all known experimental protein–RNA complexes. The method was applied to ssRNA with up to 12 nucleotides which bind to dimers of the RNA recognition motifs (RRMs), a highly abundant eukaryotic RNA-binding domain. The fragment based docking allows a precise de novo atomic modeling of protein-bound ssRNA chains. On a benchmark of seven experimental ssRNA–RRM complexes, near-native models (with a mean heavy-atom deviation of <3 Å from experiment) were generated for six out of seven bound RNA chains, and even more precise models (deviation < 2 Å) were obtained for five out of seven cases, a significant improvement compared to the state of the art. The method is not restricted to RRMs but was also successfully applied to Pumilio RNA binding proteins. PMID:27131381

  12. BIOPEP-PBIL Tool for the Analysis of the Structure of Biologically Active Motifs Derived from Food Proteins

    Directory of Open Access Journals (Sweden)

    Jerzy Dziuba

    2011-01-01

    Full Text Available This work describes a flexible technique for the analysis of protein sequences as a source of motifs affecting bodily functions. The BIOPEP database, along with the Pôle Bioinformatique Lyonnais (PBIL server, were applied to define which activities of peptides dominated in their protein precursors and which structure of the protein contained the most of the revealed activities. Such an approach could be helpful in finding some structural requirements for peptide(s to be regarded as biologically active (bioactive. It was found that apart from the activities of peptides that commonly occur in the majority of proteins (e.g. ACE inhibitors, all analyzed proteins can be a source of motifs involved in e.g. activation of ubiquitin-mediated proteolysis. This could be important in designing diets for patients who suffer from neural diseases. The structure and bioactivity analyses revealed that if peptides were to be 'bioactive', it is essential that they assume the position of a coil (or combination of coil and a-helix in the sequence of their protein precursors. However, it is recommended to consider the factors such as the length of peptide chains, the number of peptides in the database as well as the repeatability of the occurrence of characteristic amino acids, both in the peptide and in the protein when studying the bioactivity and structure of biomolecules.

  13. De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

    DEFF Research Database (Denmark)

    Ruzzo, Walter L; Gorodkin, Jan

    2014-01-01

    De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphas...... on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.......De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis...

  14. Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

    Science.gov (United States)

    Shan, Gao; Zheng, Wei-Mou

    2009-02-01

    By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.

  15. Protein sequence comparison and protein evolution

    Energy Technology Data Exchange (ETDEWEB)

    Pearson, W.R. [Univ. of Virginia, Charlottesville, VA (United States). Dept. of Biochemistry

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  16. NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole

    2011-01-01

    Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity...... to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs...... associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can...

  17. EEVD motif of heat shock cognate protein 70 contributes to bacterial uptake by trophoblast giant cells

    Directory of Open Access Journals (Sweden)

    Kim Suk

    2009-12-01

    Full Text Available Abstract Background The uptake of abortion-inducing pathogens by trophoblast giant (TG cells is a key event in infectious abortion. However, little is known about phagocytic functions of TG cells against the pathogens. Here we show that heat shock cognate protein 70 (Hsc70 contributes to bacterial uptake by TG cells and the EEVD motif of Hsc70 plays an important role in this. Methods Brucella abortus and Listeria monocytogenes were used as the bacterial antigen in this study. Recombinant proteins containing tetratricopeptide repeat (TPR domains were constructed and confirmation of the binding capacity to Hsc70 was assessed by ELISA. The recombinant TPR proteins were used for investigation of the effect of TPR proteins on bacterial uptake by TG cells and on pregnancy in mice. Results The monoclonal antibody that inhibits bacterial uptake by TG cells reacted with the EEVD motif of Hsc70. Bacterial TPR proteins bound to the C-terminal of Hsc70 through its EEVD motif and this binding inhibited bacterial uptake by TG cells. Infectious abortion was also prevented by blocking the EEVD motif of Hsc70. Conclusions Our results demonstrate that surface located Hsc70 on TG cells mediates the uptake of pathogenic bacteria and proteins containing the TPR domain inhibit the function of Hsc70 by binding to its EEVD motif. These molecules may be useful in the development of methods for preventing infectious abortion.

  18. Novel and deviant Walker A ATP-binding motifs in bacteriophage large terminase-DNA packaging proteins

    International Nuclear Information System (INIS)

    Mitchell, Michael S.; Rao, Venigalla B.

    2004-01-01

    Bacteriophage terminases constitute a very interesting class of viral-coded multifunctional ATPase 'motors' that apparently drive directional translocation of DNA into an empty viral capsid. A common Walker A motif and other conserved signatures of a critical ATPase catalytic center are identified in the N-terminal half of numerous large terminase proteins. However, several terminases, including the well-characterized λ and SPP1 terminases, seem to lack the classic Walker A in the N-terminus. Using sequence alignment approaches, we discovered the presence of deviant Walker A motifs in these and many other phage terminases. One deviation, the presence of a lysine at the beginning of P-loop, may represent a 3D equivalent of the universally conserved lysine in the Walker A GKT/S signature. This and other novel putative Walker A motifs that first came to light through this study help define the ATPase centers of phage and viral terminases as well as elicit important insights into the molecular functioning of this fundamental motif in biological systems

  19. MicroRNA categorization using sequence motifs and k-mers.

    Science.gov (United States)

    Yousef, Malik; Khalifa, Waleed; Acar, İlhan Erkin; Allmer, Jens

    2017-03-14

    Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.

  20. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.

    Science.gov (United States)

    Pan, Xiaoyong; Shen, Hong-Bin

    2017-02-28

    RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6

  1. The conservation pattern of short linear motifs is highly correlated with the function of interacting protein domains

    Directory of Open Access Journals (Sweden)

    Wang Yiguo

    2008-10-01

    Full Text Available Abstract Background Many well-represented domains recognize primary sequences usually less than 10 amino acids in length, called Short Linear Motifs (SLiMs. Accurate prediction of SLiMs has been difficult because they are short (often Results Our combined approach revealed that SLiMs are highly conserved in proteins from functional classes that are known to interact with a specific domain, but that they are not conserved in most other protein groups. We found that SLiMs recognized by SH2 domains were highly conserved in receptor kinases/phosphatases, adaptor molecules, and tyrosine kinases/phosphatases, that SLiMs recognized by SH3 domains were highly conserved in cytoskeletal and cytoskeletal-associated proteins, that SLiMs recognized by PDZ domains were highly conserved in membrane proteins such as channels and receptors, and that SLiMs recognized by S/T kinase domains were highly conserved in adaptor molecules, S/T kinases/phosphatases, and proteins involved in transcription or cell cycle control. We studied Tyr-SLiMs recognized by SH2 domains in more detail, and found that SH2-recognized Tyr-SLiMs on the cytoplasmic side of membrane proteins are more highly conserved than those on the extra-cellular side. Also, we found that SH2-recognized Tyr-SLiMs that are associated with SH3 motifs and a tyrosine kinase phosphorylation motif are more highly conserved. Conclusion The interactome of protein domains is reflected by the evolutionary conservation of SLiMs recognized by these domains. Combining scoring matrixes derived from peptide libraries and conservation analysis, we would be able to find those protein groups that are more likely to interact with specific domains.

  2. Salt-bridging effects on short amphiphilic helical structure and introducing sequence-based short beta-turn motifs.

    Science.gov (United States)

    Guarracino, Danielle A; Gentile, Kayla; Grossman, Alec; Li, Evan; Refai, Nader; Mohnot, Joy; King, Daniel

    2018-02-01

    Determining the minimal sequence necessary to induce protein folding is beneficial in understanding the role of protein-protein interactions in biological systems, as their three-dimensional structures often dictate their activity. Proteins are generally comprised of discrete secondary structures, from α-helices to β-turns and larger β-sheets, each of which is influenced by its primary structure. Manipulating the sequence of short, moderately helical peptides can help elucidate the influences on folding. We created two new scaffolds based on a modestly helical eight-residue peptide, PT3, we previously published. Using circular dichroism (CD) spectroscopy and changing the possible salt-bridging residues to new combinations of Lys, Arg, Glu, and Asp, we found that our most helical improvements came from the Arg-Glu combination, whereas the Lys-Asp was not significantly different from the Lys-Glu of the parent scaffold, PT3. The marked 3 10 -helical contributions in PT3 were lessened in the Arg-Glu-containing peptide with the beginning of cooperative unfolding seen through a thermal denaturation. However, a unique and unexpected signature was seen for the denaturation of the Lys-Asp peptide which could help elucidate the stages of folding between the 3 10 and α-helix. In addition, we developed a short six-residue peptide with β-turn/sheet CD signature, again to help study minimal sequences needed for folding. Overall, the results indicate that improvements made to short peptide scaffolds by fine-tuning the salt-bridging residues can enhance scaffold structure. Likewise, with the results from the new, short β-turn motif, these can help impact future peptidomimetic designs in creating biologically useful, short, structured β-sheet-forming peptides.

  3. Insights into the molecular evolution of the PDZ/LIM family and identification of a novel conserved protein motif.

    Directory of Open Access Journals (Sweden)

    Aartjan J W Te Velthuis

    Full Text Available The PDZ and LIM domain-containing protein family is encoded by a diverse group of genes whose phylogeny has currently not been analyzed. In mammals, ten genes are found that encode both a PDZ- and one or several LIM-domains. These genes are: ALP, RIL, Elfin (CLP36, Mystique, Enigma (LMP-1, Enigma homologue (ENH, ZASP (Cypher, Oracle, LMO7 and the two LIM domain kinases (LIMK1 and LIMK2. As conventional alignment and phylogenetic procedures of full-length sequences fell short of elucidating the evolutionary history of these genes, we started to analyze the PDZ and LIM domain sequences themselves. Using information from most sequenced eukaryotic lineages, our phylogenetic analysis is based on full-length cDNA-, EST-derived- and genomic- PDZ and LIM domain sequences of over 25 species, ranging from yeast to humans. Plant and protozoan homologs were not found. Our phylogenetic analysis identifies a number of domain duplication and rearrangement events, and shows a single convergent event during evolution of the PDZ/LIM family. Further, we describe the separation of the ALP and Enigma subfamilies in lower vertebrates and identify a novel consensus motif, which we call 'ALP-like motif' (AM. This motif is highly-conserved between ALP subfamily proteins of diverse organisms. We used here a combinatorial approach to define the relation of the PDZ and LIM domain encoding genes and to reconstruct their phylogeny. This analysis allowed us to classify the PDZ/LIM family and to suggest a meaningful model for the molecular evolution of the diverse gene architectures found in this multi-domain family.

  4. Use of Cre/loxP recombination to swap cell binding motifs on the adenoviral capsid protein IX

    International Nuclear Information System (INIS)

    Poulin, Kathy L.; Tong, Grace; Vorobyova, Olga; Pool, Madeline; Kothary, Rashmi; Parks, Robin J.

    2011-01-01

    We used Cre/loxP recombination to swap targeting ligands present on the adenoviral capsid protein IX (pIX). A loxP-flanked sequence encoding poly-lysine (pK-binds heparan sulfate proteoglycans) was engineered onto the 3'-terminus of pIX, and the resulting fusion protein allowed for routine virus propagation. Growth of this virus on Cre-expressing cells removed the pK coding sequence, generating virus that could only infect through alternative ligands, such as a tyrosine kinase receptor A (TrkA)-binding motif engineered into the capsid fibre protein for enhanced infection of neuronal cells. We used a similar approach to swap the pK motif on pIX for a sequence encoding a single-domain antibody directed towards CD66c for targeted infection of cancer cells; Cre-mediated removal of the pK-coding sequence simultaneously placed the single-domain antibody coding sequence in frame with pIX. Thus, we have developed a simple method to propagate virus lacking native viral tropism but containing cell-specific binding ligands. - Highlights: → We describe a method to grow virus lacking native tropism but containing novel cell-binding ligands. → Cre/loxP recombination was used to modify the adenovirus genome. → A targeting ligand present on capsid protein IX was removed or replaced using recombination. → Cre-loxP was also used to 'swap' the identity of the targeting ligand present on pIX.

  5. MPN+, a putative catalytic motif found in a subset of MPN domain proteins from eukaryotes and prokaryotes, is critical for Rpn11 function

    Directory of Open Access Journals (Sweden)

    Hofmann Kay

    2002-09-01

    Full Text Available Abstract Background Three macromolecular assemblages, the lid complex of the proteasome, the COP9-Signalosome (CSN and the eIF3 complex, all consist of multiple proteins harboring MPN and PCI domains. Up to now, no specific function for any of these proteins has been defined, nor has the importance of these motifs been elucidated. In particular Rpn11, a lid subunit, serves as the paradigm for MPN-containing proteins as it is highly conserved and important for proteasome function. Results We have identified a sequence motif, termed the MPN+ motif, which is highly conserved in a subset of MPN domain proteins such as Rpn11 and Csn5/Jab1, but is not present outside of this subfamily. The MPN+ motif consists of five polar residues that resemble the active site residues of hydrolytic enzyme classes, particularly that of metalloproteases. By using site-directed mutagenesis, we show that the MPN+ residues are important for the function of Rpn11, while a highly conserved Cys residue outside of the MPN+ motif is not essential. Single amino acid substitutions in MPN+ residues all show similar phenotypes, including slow growth, sensitivity to temperature and amino acid analogs, and general proteasome-dependent proteolysis defects. Conclusions The MPN+ motif is abundant in certain MPN-domain proteins, including newly identified proteins of eukaryotes, bacteria and archaea thought to act outside of the traditional large PCI/MPN complexes. The putative catalytic nature of the MPN+ motif makes it a good candidate for a pivotal enzymatic function, possibly a proteasome-associated deubiquitinating activity and a CSN-associated Nedd8/Rub1-removing activity.

  6. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active...... related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein...... sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally...

  7. SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

    Science.gov (United States)

    Ramu, Chenna

    2003-07-01

    SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.

  8. Amino acid sequence motifs essential for P0-mediated suppression of RNA silencing in an isolate of potato leafroll virus from Inner Mongolia.

    Science.gov (United States)

    Zhuo, Tao; Li, Yuan-Yuan; Xiang, Hai-Ying; Wu, Zhan-Yu; Wang, Xian-Bin; Wang, Ying; Zhang, Yong-Liang; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2014-06-01

    Polerovirus P0 suppressors of host gene silencing contain a consensus F-box-like motif with Leu/Pro (L/P) requirements for suppressor activity. The Inner Mongolian Potato leafroll virus (PLRV) P0 protein (P0(PL-IM)) has an unusual F-box-like motif that contains a Trp/Gly (W/G) sequence and an additional GW/WG-like motif (G139/W140/G141) that is lacking in other P0 proteins. We used Agrobacterium infiltration-mediated RNA silencing assays to establish that P0(PL-IM) has a strong suppressor activity. Mutagenesis experiments demonstrated that the P0(PL-IM) F-box-like motif encompasses amino acids 76-LPRHLHYECLEWGLLCG THP-95, and that the suppressor activity is abolished by L76A, W87A, or G88A substitution. The suppressor activity is also weakened substantially by mutations within the G139/W140/G141 region and is eliminated by a mutation (F220R) in a C-terminal conserved sequence of P0(PL-IM). As has been observed with other P0 proteins, P0(PL-IM) suppression is correlated with reduced accumulation of the host AGO1-silencing complex protein. However, P0(PL-IM) fails to bind SKP1, which functions in a proteasome pathway that may be involved in AGO1 degradation. These results suggest that P0(PL-IM) may suppress RNA silencing by using an alternative pathway to target AGO1 for degradation. Our results help improve our understanding of the molecular mechanisms involved in PLRV infection.

  9. Constraining cyclic peptides to mimic protein structure motifs

    DEFF Research Database (Denmark)

    Hill, Timothy A.; Shepherd, Nicholas E.; Diness, Frederik

    2014-01-01

    peptides can have protein-like biological activities and potencies, enabling their uses as biological probes and leads to therapeutics, diagnostics and vaccines. This Review highlights examples of cyclic peptides that mimic three-dimensional structures of strand, turn or helical segments of peptides...... and proteins, and identifies some additional restraints incorporated into natural product cyclic peptides and synthetic macrocyclic pepti-domimetics that refine peptide structure and confer biological properties....

  10. Plasmodium vivax antigen discovery based on alpha-helical coiled coil protein motif

    DEFF Research Database (Denmark)

    Céspedes, Nora; Habel, Catherine; Lopez-Perez, Mary

    2014-01-01

    Protein α-helical coiled coil structures that elicit antibody responses, which block critical functions of medically important microorganisms, represent a means for vaccine development. By using bioinformatics algorithms, a total of 50 antigens with α-helical coiled coil motifs orthologous to Pla...

  11. Evidence for the additions of clustered interacting nodes during the evolution of protein interaction networks from network motifs

    Directory of Open Access Journals (Sweden)

    Guo Hao

    2011-05-01

    Full Text Available Abstract Background High-throughput screens have revealed large-scale protein interaction networks defining most cellular functions. How the proteins were added to the protein interaction network during its growth is a basic and important issue. Network motifs represent the simplest building blocks of cellular machines and are of biological significance. Results Here we study the evolution of protein interaction networks from the perspective of network motifs. We find that in current protein interaction networks, proteins of the same age class tend to form motifs and such co-origins of motif constituents are affected by their topologies and biological functions. Further, we find that the proteins within motifs whose constituents are of the same age class tend to be densely interconnected, co-evolve and share the same biological functions, and these motifs tend to be within protein complexes. Conclusions Our findings provide novel evidence for the hypothesis of the additions of clustered interacting nodes and point out network motifs, especially the motifs with the dense topology and specific function may play important roles during this process. Our results suggest functional constraints may be the underlying driving force for such additions of clustered interacting nodes.

  12. Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview.

    Science.gov (United States)

    Karvelis, Tautvydas; Gasiunas, Giedrius; Siksnys, Virginijus

    2017-05-15

    Recently the Cas9, an RNA guided DNA endonuclease, emerged as a powerful tool for targeted genome manipulations. Cas9 protein can be reprogrammed to cleave, bind or nick any DNA target by simply changing crRNA sequence, however a short nucleotide sequence, termed PAM, is required to initiate crRNA hybridization to the DNA target. PAM sequence is recognized by Cas9 protein and must be determined experimentally for each Cas9 variant. Exploration of Cas9 orthologs could offer a diversity of PAM sequences and novel biochemical properties that may be beneficial for genome editing applications. Here we briefly review and compare Cas9 PAM identification assays that can be adopted for other PAM-dependent CRISPR-Cas systems. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Science.gov (United States)

    Grimm, Guido W.; Renner, Susanne S.; Stamatakis, Alexandros; Hemleben, Vera

    2007-01-01

    The multi-copy internal transcribed spacer (ITS) region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML) and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation) instead of the full (partly redundant) original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994) 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly. PMID:19455198

  14. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Directory of Open Access Journals (Sweden)

    Guido W. Grimm

    2006-01-01

    Full Text Available The multi-copy internal transcribed spacer (ITS region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation instead of the full (partly redundant original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

  15. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs

    DEFF Research Database (Denmark)

    van Beest, M; Dooijes, D; van De Wetering, M

    2000-01-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment...

  16. Sequence- and interactome-based prediction of viral protein hotspots targeting host proteins: a case study for HIV Nef.

    Directory of Open Access Journals (Sweden)

    Mahdi Sarmady

    Full Text Available Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk.

  17. Plasmodium vivax antigen discovery based on alpha-helical coiled coil protein motif.

    Directory of Open Access Journals (Sweden)

    Nora Céspedes

    Full Text Available Protein α-helical coiled coil structures that elicit antibody responses, which block critical functions of medically important microorganisms, represent a means for vaccine development. By using bioinformatics algorithms, a total of 50 antigens with α-helical coiled coil motifs orthologous to Plasmodium falciparum were identified in the P. vivax genome. The peptides identified in silico were chemically synthesized; circular dichroism studies indicated partial or high α-helical content. Antigenicity was evaluated using human sera samples from malaria-endemic areas of Colombia and Papua New Guinea. Eight of these fragments were selected and used to assess immunogenicity in BALB/c mice. ELISA assays indicated strong reactivity of serum samples from individuals residing in malaria-endemic regions and sera of immunized mice, with the α-helical coiled coil structures. In addition, ex vivo production of IFN-γ by murine mononuclear cells confirmed the immunogenicity of these structures and the presence of T-cell epitopes in the peptide sequences. Moreover, sera of mice immunized with four of the eight antigens recognized native proteins on blood-stage P. vivax parasites, and antigenic cross-reactivity with three of the peptides was observed when reacted with both the P. falciparum orthologous fragments and whole parasites. Results here point to the α-helical coiled coil peptides as possible P. vivax malaria vaccine candidates as were observed for P. falciparum. Fragments selected here warrant further study in humans and non-human primate models to assess their protective efficacy as single components or assembled as hybrid linear epitopes.

  18. Genome-wide identification of VQ motif-containing proteins and their expression profiles under abiotic stresses in maize

    Directory of Open Access Journals (Sweden)

    Weibin eSong

    2016-01-01

    Full Text Available VQ motif-containing proteins play crucial roles in abiotic stress responses in plants. Recent studies have shown that some VQ proteins physically interact with WRKY transcription factors to activate downstream genes. In the present study, we identified and characterized genes encoding VQ motif-containing proteins using the most recent version of the maize genome sequence. In total, 61VQ genes were identified. In a cluster analysis, these genes clustered into nine groups together with their homologous genes in rice and Arabidopsis. Most of the VQ genes (57 out of 61 numbers identified in maize were found to be single-copy genes. Analyses of RNA-seq data obtained using seedlings under long-term drought treatment showed that the expression levels of most ZmVQ genes (41 out of 61 members changed during the drought stress response. Quantitative real-time PCR analyses showed that most of the ZmVQ genes were responsive to NaCl treatment. Also, approximately half of the ZmVQ genes were co-expressed with ZmWRKY genes. The identification of these VQ genes in the maize genome and knowledge of their expression profiles under drought and osmotic stresses will provide a solid foundation for exploring their specific functions in the abiotic stress responses of maize.

  19. The MHC motif viewer: a visualization tool for MHC binding motifs

    DEFF Research Database (Denmark)

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole

    2010-01-01

    is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  20. In Silico Characterization of Pectate Lyase Protein Sequences from Different Source Organisms

    Directory of Open Access Journals (Sweden)

    Amit Kumar Dubey

    2010-01-01

    Full Text Available A total of 121 protein sequences of pectate lyases were subjected to homology search, multiple sequence alignment, phylogenetic tree construction, and motif analysis. The phylogenetic tree constructed revealed different clusters based on different source organisms representing bacterial, fungal, plant, and nematode pectate lyases. The multiple accessions of bacterial, fungal, nematode, and plant pectate lyase protein sequences were placed closely revealing a sequence level similarity. The multiple sequence alignment of these pectate lyase protein sequences from different source organisms showed conserved regions at different stretches with maximum homology from amino acid residues 439–467, 715–816, and 829–910 which could be used for designing degenerate primers or probes specific for pectate lyases. The motif analysis revealed a conserved Pec_Lyase_C domain uniformly observed in all pectate lyases irrespective of variable sources suggesting its possible role in structural and enzymatic functions.

  1. Markovian Model in High Order Sequence Prediction From Log-Motif Patterns in Agbada Paralic Section, Niger Delta, Nigeria

    International Nuclear Information System (INIS)

    Olabode, S. O.; Adekoya, J. A.

    2002-01-01

    Markovian model in the elucidation of high order sequence was applied to repetitive events of regressive and transgressive phases in the Agbada paralic section Niger Delta. The repetitive events are made up of delta front, delta topset and fluvio-deltaic sediments. The sediments consist of sands, sandstones, siltstones and shales in various proportions. Five wells: MN1, AA1, NP2, NP6 and NP8 were studied.Summary of biostratigraphic report and well log-motif patterns was used to delineate the third order depositional sequences in the wells.Various Markovian properties - observed transition frequency matrix, observed transition probability matrix, fixed probability vector, expected random matrix (randomised transition matrix) and difference matrix were determined for stacked high order sequence (high frequency cyclic events) nested within the third-order sequences using the log-motif patterns for the various sand bodies and shales. Flow diagrams were constructed for each of the depositional sequences to know the likely occurrence of number of cycles.Upward transition matrix between the log-motif patterns and flow diagram to elucidate cyclicity show that the overall regressive sequence of the Niger Delta has been modified by deltaic depositional elements and fluctuations in sea level. The predictions of higher order sequence within third order sequences from Markovian Properties provide good basis for correlation within the depositional sequences. The model has also been used to decipher the dominant depositional processes during the formation of the sequences. Discrete reservoir intervals and seal potentials within the sequences were also predicted from the flow diagrams constructed

  2. Computational analysis and prediction of the binding motif and protein interacting partners of the Abl SH3 domain.

    Directory of Open Access Journals (Sweden)

    Tingjun Hou

    2006-01-01

    Full Text Available Protein-protein interactions, particularly weak and transient ones, are often mediated by peptide recognition domains, such as Src Homology 2 and 3 (SH2 and SH3 domains, which bind to specific sequence and structural motifs. It is important but challenging to determine the binding specificity of these domains accurately and to predict their physiological interacting partners. In this study, the interactions between 35 peptide ligands (15 binders and 20 non-binders and the Abl SH3 domain were analyzed using molecular dynamics simulation and the Molecular Mechanics/Poisson-Boltzmann Solvent Area method. The calculated binding free energies correlated well with the rank order of the binding peptides and clearly distinguished binders from non-binders. Free energy component analysis revealed that the van der Waals interactions dictate the binding strength of peptides, whereas the binding specificity is determined by the electrostatic interaction and the polar contribution of desolvation. The binding motif of the Abl SH3 domain was then determined by a virtual mutagenesis method, which mutates the residue at each position of the template peptide relative to all other 19 amino acids and calculates the binding free energy difference between the template and the mutated peptides using the Molecular Mechanics/Poisson-Boltzmann Solvent Area method. A single position mutation free energy profile was thus established and used as a scoring matrix to search peptides recognized by the Abl SH3 domain in the human genome. Our approach successfully picked ten out of 13 experimentally determined binding partners of the Abl SH3 domain among the top 600 candidates from the 218,540 decapeptides with the PXXP motif in the SWISS-PROT database. We expect that this physical-principle based method can be applied to other protein domains as well.

  3. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  4. Four signature motifs define the first class of structurally related large coiled-coil proteins in plants.

    Directory of Open Access Journals (Sweden)

    Meier Iris

    2002-04-01

    Full Text Available Abstract Background Animal and yeast proteins containing long coiled-coil domains are involved in attaching other proteins to the large, solid-state components of the cell. One subgroup of long coiled-coil proteins are the nuclear lamins, which are involved in attaching chromatin to the nuclear envelope and have recently been implicated in inherited human diseases. In contrast to other eukaryotes, long coiled-coil proteins have been barely investigated in plants. Results We have searched the completed Arabidopsis genome and have identified a family of structurally related long coiled-coil proteins. Filament-like plant proteins (FPP were identified by sequence similarity to a tomato cDNA that encodes a coiled-coil protein which interacts with the nuclear envelope-associated protein, MAF1. The FPP family is defined by four novel unique sequence motifs and by two clusters of long coiled-coil domains separated by a non-coiled-coil linker. All family members are expressed in a variety of Arabidopsis tissues. A homolog sharing the structural features was identified in the monocot rice, indicating conservation among angiosperms. Conclusion Except for myosins, this is the first characterization of a family of long coiled-coil proteins in plants. The tomato homolog of the FPP family binds in a yeast two-hybrid assay to a nuclear envelope-associated protein. This might suggest that FPP family members function in nuclear envelope biology. Because the full Arabidopsis genome does not appear to contain genes for lamins, it is of interest to investigate other long coiled-coil proteins, which might functionally replace lamins in the plant kingdom.

  5. Flow Cytometry-Assisted Cloning of Specific Sequence Motifs from Complex 16S rRNA Gene Libraries

    DEFF Research Database (Denmark)

    Nielsen, Jeppe Lund; Schramm, Andreas; Bernhard, Anne E.

    2004-01-01

    for Systems Biology,3 Seattle, Washington, and Department of Ecological Microbiology, University of Bayreuth, Bayreuth, Germany2 A flow cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting......  FLOW CYTOMETRY-ASSISTED CLONING OF SPECIFIC SEQUENCE MOTIFS FROM COMPLEX 16S RRNA GENE LIBRARIES Jeppe L. Nielsen,1 Andreas Schramm,1,2 Anne E. Bernhard,1 Gerrit J. van den Engh,3 and David A. Stahl1* Department of Civil and Environmental Engineering, University of Washington,1 and Institute......-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant in a clone library of environmental 16S rRNA genes.  ...

  6. Sequence-specific DNA binding activity of the cross-brace zinc finger motif of the piggyBac transposase

    Science.gov (United States)

    Morellet, Nelly; Li, Xianghong; Wieninger, Silke A; Taylor, Jennifer L; Bischerour, Julien; Moriau, Séverine; Lescop, Ewen; Bardiaux, Benjamin; Mathy, Nathalie; Assrir, Nadine; Bétermier, Mireille; Nilges, Michael; Hickman, Alison B; Dyda, Fred; Craig, Nancy L; Guittet, Eric

    2018-01-01

    Abstract The piggyBac transposase (PB) is distinguished by its activity and utility in genome engineering, especially in humans where it has highly promising therapeutic potential. Little is known, however, about the structure–function relationships of the different domains of PB. Here, we demonstrate in vitro and in vivo that its C-terminal Cysteine-Rich Domain (CRD) is essential for DNA breakage, joining and transposition and that it binds to specific DNA sequences in the left and right transposon ends, and to an additional unexpectedly internal site at the left end. Using NMR, we show that the CRD adopts the specific fold of the cross-brace zinc finger protein family. We determine the interaction interfaces between the CRD and its target, the 5′-TGCGT-3′/3′-ACGCA-5′ motifs found in the left, left internal and right transposon ends, and use NMR results to propose docking models for the complex, which are consistent with our site-directed mutagenesis data. Our results provide support for a model of the PB/DNA interactions in the context of the transpososome, which will be useful for the rational design of PB mutants with increased activity. PMID:29385532

  7. Protein clustering and RNA phylogenetic reconstruction of the influenza A [corrected] virus NS1 protein allow an update in classification and identification of motif conservation.

    Science.gov (United States)

    Sevilla-Reyes, Edgar E; Chavaro-Pérez, David A; Piten-Isidro, Elvira; Gutiérrez-González, Luis H; Santos-Mendoza, Teresa

    2013-01-01

    The non-structural protein 1 (NS1) of influenza A virus (IAV), coded by its third most diverse gene, interacts with multiple molecules within infected cells. NS1 is involved in host immune response regulation and is a potential contributor to the virus host range. Early phylogenetic analyses using 50 sequences led to the classification of NS1 gene variants into groups (alleles) A and B. We reanalyzed NS1 diversity using 14,716 complete NS IAV sequences, downloaded from public databases, without host bias. Removal of sequence redundancy and further structured clustering at 96.8% amino acid similarity produced 415 clusters that enhanced our capability to detect distinct subgroups and lineages, which were assigned a numerical nomenclature. Maximum likelihood phylogenetic reconstruction using RNA sequences indicated the previously identified deep branching separating group A from group B, with five distinct subgroups within A as well as two and five lineages within the A4 and A5 subgroups, respectively. Our classification model proposes that sequence patterns in thirteen amino acid positions are sufficient to fit >99.9% of all currently available NS1 sequences into the A subgroups/lineages or the B group. This classification reduces host and virus bias through the prioritization of NS1 RNA phylogenetics over host or virus phenetics. We found significant sequence conservation within the subgroups and lineages with characteristic patterns of functional motifs, such as the differential binding of CPSF30 and crk/crkL or the availability of a C-terminal PDZ-binding motif. To understand selection pressures and evolution acting on NS1, it is necessary to organize the available data. This updated classification may help to clarify and organize the study of NS1 interactions and pathogenic differences and allow the drawing of further functional inferences on sequences in each group, subgroup and lineage rather than on a strain-by-strain basis.

  8. IQCJ-SCHIP1, a novel fusion transcript encoding a calmodulin-binding IQ motif protein

    International Nuclear Information System (INIS)

    Kwasnicka-Crawford, Dorota A.; Carson, Andrew R.; Scherer, Stephen W.

    2006-01-01

    The existence of transcripts that span two adjacent, independent genes is considered rare in the human genome. This study characterizes a novel human fusion gene named IQCJ-SCHIP1. IQCJ-SCHIP1 is the longest isoform of a complex transcriptional unit that bridges two separate genes that encode distinct proteins, IQCJ, a novel IQ motif containing protein and SCHIP1, a schwannomin interacting protein that has been previously shown to interact with the Neurofibromatosis type 2 (NF2) protein. IQCJ-SCHIP1 is located on the chromosome 3q25 and comprises a 1692-bp transcript encompassing 11 exons spanning 828 kb of the genomic DNA. We show that IQCJ-SCHIP1 mRNA is highly expressed in the brain. Protein encoded by the IQCJ-SCHIP1 gene was localized to cytoplasm and actin-rich regions and in differentiated PC12 cells was also seen in neurite extensions

  9. A PDZ-Like Motif in the Biliary Transporter ABCB4 Interacts with the Scaffold Protein EBP50 and Regulates ABCB4 Cell Surface Expression.

    Directory of Open Access Journals (Sweden)

    Quitterie Venot

    Full Text Available ABCB4/MDR3, a member of the ABC superfamily, is an ATP-dependent phosphatidylcholine translocator expressed at the canalicular membrane of hepatocytes. Defects in the ABCB4 gene are associated with rare biliary diseases. It is essential to understand the mechanisms of its canalicular membrane expression in particular for the development of new therapies. The stability of several ABC transporters is regulated through their binding to PDZ (PSD95/DglA/ZO-1 domain-containing proteins. ABCB4 protein ends by the sequence glutamine-asparagine-leucine (QNL, which shows some similarity to PDZ-binding motifs. The aim of our study was to assess the potential role of the QNL motif on the surface expression of ABCB4 and to determine if PDZ domain-containing proteins are involved. We found that truncation of the QNL motif decreased the stability of ABCB4 in HepG2-transfected cells. The deleted mutant ABCB4-ΔQNL also displayed accelerated endocytosis. EBP50, a PDZ protein highly expressed in the liver, strongly colocalized and coimmunoprecipitated with ABCB4, and this interaction required the QNL motif. Down-regulation of EBP50 by siRNA or by expression of an EBP50 dominant-negative mutant caused a significant decrease in the level of ABCB4 protein expression, and in the amount of ABCB4 localized at the canalicular membrane. Interaction of ABCB4 with EBP50 through its PDZ-like motif plays a critical role in the regulation of ABCB4 expression and stability at the canalicular plasma membrane.

  10. Tetratricopeptide-motif-mediated interaction of FANCG with recombination proteins XRCC3 and BRCA2.

    Science.gov (United States)

    Hussain, Shobbir; Wilson, James B; Blom, Eric; Thompson, Larry H; Sung, Patrick; Gordon, Susan M; Kupfer, Gary M; Joenje, Hans; Mathew, Christopher G; Jones, Nigel J

    2006-05-10

    Fanconi anaemia is an inherited chromosomal instability disorder characterised by cellular sensitivity to DNA interstrand crosslinkers, bone-marrow failure and a high risk of cancer. Eleven FA genes have been identified, one of which, FANCD1, is the breast cancer susceptibility gene BRCA2. At least eight FA proteins form a nuclear core complex required for monoubiquitination of FANCD2. The BRCA2/FANCD1 protein is connected to the FA pathway by interactions with the FANCG and FANCD2 proteins, both of which co-localise with the RAD51 recombinase, which is regulated by BRCA2. These connections raise the question of whether any of the FANC proteins of the core complex might also participate in other complexes involved in homologous recombination repair. We therefore tested known FA proteins for direct interaction with RAD51 and its paralogs XRCC2 and XRCC3. FANCG was found to interact with XRCC3, and this interaction was disrupted by the FA-G patient derived mutation L71P. FANCG was co-immunoprecipitated with both XRCC3 and BRCA2 from extracts of human and hamster cells. The FANCG-XRCC3 and FANCG-BRCA2 interactions did not require the presence of other FA proteins from the core complex, suggesting that FANCG also participates in a DNA repair complex that is downstream and independent of FANCD2 monoubiquitination. Additionally, XRCC3 and BRCA2 proteins co-precipitate in both human and hamster cells and this interaction requires FANCG. The FANCG protein contains multiple tetratricopeptide repeat motifs (TPRs), which function as scaffolds to mediate protein-protein interactions. Mutation of one or more of these motifs disrupted all of the known interactions of FANCG. We propose that FANCG, in addition to stabilising the FA core complex, may have a role in building multiprotein complexes that facilitate homologous recombination repair.

  11. The Monitoring and Affinity Purification of Proteins Using Dual Tags with Tetracysteine Motifs

    Science.gov (United States)

    Giannone, Richard J.; Liu, Yie; Wang, Yisong

    Identification and characterization of protein-protein interaction networks is essential for the elucidation of biochemical mechanisms and cellular function. Affinity purification in combination with liquid chromatography-tandem mass spectrometry (LC-MS/MS) has emerged as a very powerful tactic for the identification of specific protein-protein interactions. In this chapter, we describe a comprehensive methodology that uses our recently developed dual-tag affinity purification system for the enrichment and identification of mammalian protein complexes. The protocol covers a series of separate but sequentially related techniques focused on the facile monitoring and purification of a dual-tagged protein of interest and its interacting partners via a system built with tetracysteine motifs and various combinations of affinity tags. Using human telomeric repeat binding factor 2 (TRF2) as an example, we demonstrate the power of the system in terms of bait protein recovery after dual-tag affinity purification, detection of bait protein subcellular localization and expression, and successful identification of known and potentially novel TRF2 interacting proteins. Although the protocol described here has been optimized for the identification and characterization of TRF2-associated proteins, it is, in principle, applicable to the study of any other mammalian protein complexes that may be of interest to the research community.

  12. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  13. Two sequence motifs from HIF-1α bind to the DNA-binding site of p53

    OpenAIRE

    Hansson, Lars O.; Friedler, Assaf; Freund, Stefan; Rüdiger, Stefan; Fersht, Alan R.

    2002-01-01

    There is evidence that hypoxia-inducible factor-1α (HIF-1α) interacts with the tumor suppressor p53. To characterize the putative interaction, we mapped the binding of the core domain of p53 (p53c) to an array of immobilized HIF-1α-derived peptides and found two peptide-sequence motifs that bound to p53c with micromolar affinity in solution. One sequence was adjacent to and the other coincided with the two proline residues of the oxygen-dependent degradation domain (P402 and P564) that act as...

  14. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.; Rangkuti, Farania; Schramm, Michael C.; Jankovic, Boris R.; Kamau, Allan; Chowdhary, Rajesh; Archer, John A.C.; Bajic, Vladimir B.

    2011-01-01

    . These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity

  15. cDNA cloning of the basement membrane chondroitin sulfate proteoglycan core protein, bamacan: a five domain structure including coiled-coil motifs

    DEFF Research Database (Denmark)

    Wu, R R; Couchman, J R

    1997-01-01

    Basement membranes contain several proteoglycans, and those bearing heparan sulfate glycosaminoglycans such as perlecan and agrin usually predominate. Most mammalian basement membranes also contain chondroitin sulfate, and a core protein, bamacan, has been partially characterized. We have now....... The protein sequence has low overall homology, apart from very small NH2- and COOH-terminal motifs. At the junctions between the distal globular domains and the coiled-coil regions lie glycosylation sites, with up to three N-linked oligosaccharides and probably three chondroitin chains. Three other Ser...

  16. The KYxxL motif in Rad17 protein is essential for the interaction with the 9–1–1 complex

    Energy Technology Data Exchange (ETDEWEB)

    Fukumoto, Yasunori, E-mail: fukumoto@faculty.chiba-u.jp [Laboratory of Molecular Cell Biology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 260-8675 (Japan); Ikeuchi, Masayoshi; Nakayama, Yuji [Department of Biochemistry & Molecular Biology, Kyoto Pharmaceutical University, Kyoto 607-8414 (Japan); Yamaguchi, Naoto, E-mail: nyama@faculty.chiba-u.jp [Laboratory of Molecular Cell Biology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 260-8675 (Japan)

    2016-09-02

    ATR-dependent DNA damage checkpoint is the major DNA damage checkpoint against UV irradiation and DNA replication stress. The Rad17–RFC and Rad9–Rad1–Hus1 (9–1–1) complexes interact with each other to contribute to ATR signaling, however, the precise regulatory mechanism of the interaction has not been established. Here, we identified a conserved sequence motif, KYxxL, in the AAA+ domain of Rad17 protein, and demonstrated that this motif is essential for the interaction with the 9–1–1 complex. We also show that UV-induced Rad17 phosphorylation is increased in the Rad17 KYxxL mutants. These data indicate that the interaction with the 9–1–1 complex is not required for Rad17 protein to be an efficient substrate for the UV-induced phosphorylation. Our data also raise the possibility that the 9–1–1 complex plays a negative regulatory role in the Rad17 phosphorylation. We also show that the nucleotide-binding activity of Rad17 is required for its nuclear localization. - Highlights: • We have identified a conserved KYxxL motif in Rad17 protein. • The KYxxL motif is crucial for the interaction with the 9–1–1 complex. • The KYxxL motif is dispensable or inhibitory for UV-induced Rad17 phosphorylation. • Nucleotide binding of Rad17 is required for its nuclear localization.

  17. Endoplasmic reticulum protein targeting of phospholamban: a common role for an N-terminal di-arginine motif in ER retention?

    Directory of Open Access Journals (Sweden)

    Parveen Sharma

    2010-07-01

    Full Text Available Phospholamban (PLN is an effective inhibitor of the sarco(endoplasmic reticulum Ca(2+-ATPase, which transports Ca(2+ into the SR lumen, leading to muscle relaxation. A mutation of PLN in which one of the di-arginine residues at positions 13 and 14 was deleted led to a severe, early onset dilated cardiomyopathy. Here we were interested in determining the cellular mechanisms involved in this disease-causing mutation.Mutations deleting codons for either or both Arg13 or Arg14 resulted in the mislocalization of PLN from the ER. Our data show that PLN is recycled via the retrograde Golgi to ER membrane traffic pathway involving COP-I vesicles, since co-immunoprecipitation assays determined that COP I interactions are dependent on an intact di-arginine motif as PLN RDelta14 did not co-precipitate with COP I containing vesicles. Bioinformatic analysis determined that the di-arginine motif is present in the first 25 residues in a large number of all ER/SR Gene Ontology (GO annotated proteins. Mutations in the di-arginine motif of the Sigma 1-type opioid receptor, the beta-subunit of the signal recognition particle receptor, and Sterol-O-acyltransferase, three proteins identified in our bioinformatic screen also caused mislocalization of these known ER-resident proteins.We conclude that PLN is enriched in the ER due to COP I-mediated transport that is dependent on its intact di-arginine motif and that the N-terminal di-arginine motif may act as a general ER retrieval sequence.

  18. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    Science.gov (United States)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  19. Enhanced SUMOylation of proteins containing a SUMO-interacting motif by SUMO-Ubc9 fusion

    International Nuclear Information System (INIS)

    Kim, Eui Tae; Kim, Kyeong Kyu; Matunis, Mike J.; Ahn, Jin-Hyun

    2009-01-01

    Identifying new targets for SUMO and understanding the function of protein SUMOylation are largely limited by low level of SUMOylation. It was found recently that Ubc9, the SUMO E2 conjugating enzyme, is covalently modified by SUMO at a lysine 14 in the N-terminal alpha helix, and that SUMO-modified Ubc9 has enhanced conjugation activity for certain target proteins containing a SUMO-interacting motif (SIM). Here, we show that, compared to intact Ubc9, the SUMO-Ubc9 fusion protein has higher conjugating activity for SIM-containing targets such as Sp100 and human cytomegalovirus IE2. Assays using an IE2 SIM mutant revealed the requirement of SIM for the enhanced IE2 SUMOylation by SUMO-Ubc9. In pull-down assays with cell extracts, the SUMO-Ubc9 fusion protein bound to more diverse cellular proteins and interacted with some SIM-containing proteins with higher affinities than Ubc9. Therefore, the devised SUMO-Ubc9 fusion will be useful for identifying SIM-containing SUMO targets and producing SUMO-modified proteins.

  20. Synthetic protein scaffolds based on peptide motifs and cognate adaptor domains for improving metabolic productivity

    Directory of Open Access Journals (Sweden)

    Anselm H.C. Horn

    2015-11-01

    Full Text Available The efficiency of many cellular processes relies on the defined interaction among different proteins within the same metabolic or signaling pathway. Consequently, a spatial colocalization of functionally interacting proteins has frequently emerged during evolution. This concept has been adapted within the synthetic biology community for the purpose of creating artificial scaffolds. A recent advancement of this concept is the use of peptide motifs and their cognate adaptor domains. SH2, SH3, GBD, and PDZ domains have been used most often in research studies to date. The approach has been successfully applied to the synthesis of a variety of target molecules including catechin, D-glucaric acid, H2, hydrochinone, resveratrol, butyrate, gamma-aminobutyric acid, and mevalonate. Increased production levels of up to 77-fold have been observed compared to non-scaffolded systems. A recent extension of this concept is the creation of a covalent linkage between peptide motifs and adaptor domains, which leads to a more stable association of the scaffolded systems and thus bears the potential to further enhance metabolic productivity.

  1. Assessment of algorithms for inferring positional weight matrix motifs of transcription factor binding sites using protein binding microarray data.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    Full Text Available The new technology of protein binding microarrays (PBMs allows simultaneous measurement of the binding intensities of a transcription factor to tens of thousands of synthetic double-stranded DNA probes, covering all possible 10-mers. A key computational challenge is inferring the binding motif from these data. We present a systematic comparison of four methods developed specifically for reconstructing a binding site motif represented as a positional weight matrix from PBM data. The reconstructed motifs were evaluated in terms of three criteria: concordance with reference motifs from the literature and ability to predict in vivo and in vitro bindings. The evaluation encompassed over 200 transcription factors and some 300 assays. The results show a tradeoff between how the methods perform according to the different criteria, and a dichotomy of method types. Algorithms that construct motifs with low information content predict PBM probe ranking more faithfully, while methods that produce highly informative motifs match reference motifs better. Interestingly, in predicting high-affinity binding, all methods give far poorer results for in vivo assays compared to in vitro assays.

  2. Translational Control of Host Gene Expression by a Cys-Motif Protein Encoded in a Bracovirus.

    Directory of Open Access Journals (Sweden)

    Eunseong Kim

    Full Text Available Translational control is a strategy that various viruses use to manipulate their hosts to suppress acute antiviral response. Polydnaviruses, a group of insect double-stranded DNA viruses symbiotic to some endoparasitoid wasps, are divided into two genera: ichnovirus (IV and bracovirus (BV. In IV, some Cys-motif genes are known as host translation-inhibitory factors (HTIF. The genome of endoparasitoid wasp Cotesia plutellae contains a Cys-motif gene (Cp-TSP13 homologous to an HTIF known as teratocyte-secretory protein 14 (TSP14 of Microplitis croceipes. Cp-TSP13 consists of 129 amino acid residues with a predicted molecular weight of 13.987 kDa and pI value of 7.928. Genomic DNA region encoding its open reading frame has three introns. Cp-TSP13 possesses six conserved cysteine residues as other Cys-motif genes functioning as HTIF. Cp-TSP13 was expressed in Plutella xylostella larvae parasitized by C. plutellae. C. plutellae bracovirus (CpBV was purified and injected into non-parasitized P. xylostella that expressed Cp-TSP13. Cp-TSP13 was cloned into a eukaryotic expression vector and used to infect Sf9 cells to transiently express Cp-TSP13. The synthesized Cp-TSP13 protein was detected in culture broth. An overlaying experiment showed that the purified Cp-TSP13 entered hemocytes. It was localized in the cytosol. Recombinant Cp-TSP13 significantly inhibited protein synthesis of secretory proteins when it was added to in vitro cultured fat body. In addition, the recombinant Cp-TSP13 directly inhibited the translation of fat body mRNAs in in vitro translation assay using rabbit reticulocyte lysate. Moreover, the recombinant Cp-TSP13 significantly suppressed cellular immune responses by inhibiting hemocyte-spreading behavior. It also exhibited significant insecticidal activities by both injection and feeding routes. These results indicate that Cp-TSP13 is a viral HTIF.

  3. Evidence of positive selection at codon sites localized in extracellular domains of mammalian CC motif chemokine receptor proteins

    Directory of Open Access Journals (Sweden)

    Metzger Kelsey J

    2010-05-01

    Full Text Available Abstract Background CC chemokine receptor proteins (CCR1 through CCR10 are seven-transmembrane G-protein coupled receptors whose signaling pathways are known for their important roles coordinating immune system responses through targeted trafficking of white blood cells. In addition, some of these receptors have been identified as fusion proteins for viral pathogens: for example, HIV-1 strains utilize CCR5, CCR2 and CCR3 proteins to obtain cellular entry in humans. The extracellular domains of these receptor proteins are involved in ligand-binding specificity as well as pathogen recognition interactions. In mammals, the majority of chemokine receptor genes are clustered together; in humans, seven of the ten genes are clustered in the 3p21-24 chromosome region. Gene conversion events, or exchange of DNA sequence between genes, have been reported in chemokine receptor paralogs in various mammalian lineages, especially between the cytogenetically closely located pairs CCR2/5 and CCR1/3. Datasets of mammalian orthologs for each gene were analyzed separately to minimize the potential confounding impact of analyzing highly similar sequences resulting from gene conversion events. Molecular evolution approaches and the software package Phylogenetic Analyses by Maximum Likelihood (PAML were utilized to investigate the signature of selection that has acted on the mammalian CC chemokine receptor (CCR gene family. The results of neutral vs. adaptive evolution (positive selection hypothesis testing using Site Models are reported. In general, positive selection is defined by a ratio of nonsynonymous/synonymous nucleotide changes (dN/dS, or ω >1. Results Of the ten mammalian CC motif chemokine receptor sequence datasets analyzed, only CCR2 and CCR3 contain amino acid codon sites that exhibit evidence of positive selection using site based hypothesis testing in PAML. Nineteen of the twenty codon sites putatively indentified as likely to be under positive

  4. Vaccinia protein F12 has structural similarity to kinesin light chain and contains a motor binding motif required for virion export.

    Directory of Open Access Journals (Sweden)

    Gareth W Morgan

    2010-02-01

    Full Text Available Vaccinia virus (VACV uses microtubules for export of virions to the cell surface and this process requires the viral protein F12. Here we show that F12 has structural similarity to kinesin light chain (KLC, a subunit of the kinesin-1 motor that binds cargo. F12 and KLC share similar size, pI, hydropathy and cargo-binding tetratricopeptide repeats (TPRs. Moreover, molecular modeling of F12 TPRs upon the crystal structure of KLC2 TPRs showed a striking conservation of structure. We also identified multiple TPRs in VACV proteins E2 and A36. Data presented demonstrate that F12 is critical for recruitment of kinesin-1 to virions and that a conserved tryptophan and aspartic acid (WD motif, which is conserved in the kinesin-1-binding sequence (KBS of the neuronal protein calsyntenin/alcadein and several other cellular kinesin-1 binding proteins, is essential for kinesin-1 recruitment and virion transport. In contrast, mutation of WD motifs in protein A36 revealed they were not required for kinesin-1 recruitment or IEV transport. This report of a viral KLC-like protein containing a KBS that is conserved in several cellular proteins advances our understanding of how VACV recruits the kinesin motor to virions, and exemplifies how viruses use molecular mimicry of cellular components to their advantage.

  5. Characterization of hydrogen bonding motifs in proteins: hydrogen elimination monitoring by ultraviolet photodissociation mass spectrometry.

    Science.gov (United States)

    Morrison, Lindsay J; Chai, Wenrui; Rosenberg, Jake A; Henkelman, Graeme; Brodbelt, Jennifer S

    2017-08-02

    Determination of structure and folding of certain classes of proteins remains intractable by conventional structural characterization strategies and has spurred the development of alternative methodologies. Mass spectrometry-based approaches have a unique capacity to differentiate protein heterogeneity due to the ability to discriminate populations, whether minor or major, featuring modifications or complexation with non-covalent ligands on the basis of m/z. Cleavage of the peptide backbone can be further utilized to obtain residue-specific structural information. Here, hydrogen elimination monitoring (HEM) upon ultraviolet photodissociation (UVPD) of proteins transferred to the gas phase via nativespray ionization is introduced as an innovative approach to deduce backbone hydrogen bonding patterns. Using well-characterized peptides and a series of proteins, prediction of the engagement of the amide carbonyl oxygen of the protein backbone in hydrogen bonding using UVPD-HEM is demonstrated to show significant agreement with the hydrogen-bonding motifs derived from molecular dynamics simulations and X-ray crystal structures.

  6. Human HOX Proteins Use Diverse and Context-Dependent Motifs to Interact with TALE Class Cofactors.

    Science.gov (United States)

    Dard, Amélie; Reboulet, Jonathan; Jia, Yunlong; Bleicher, Françoise; Duffraisse, Marilyne; Vanaker, Jean-Marc; Forcet, Christelle; Merabet, Samir

    2018-03-13

    HOX proteins achieve numerous functions by interacting with the TALE class PBX and MEIS cofactors. In contrast to this established partnership in development and disease, how HOX proteins could interact with PBX and MEIS remains unclear. Here, we present a systematic analysis of HOX/PBX/MEIS interaction properties, scanning all paralog groups with human and mouse HOX proteins in vitro and in live cells. We demonstrate that a previously characterized HOX protein motif known to be critical for HOX-PBX interactions becomes dispensable in the presence of MEIS in all except the two most anterior paralog groups. We further identify paralog-specific TALE-binding sites that are used in a highly context-dependent manner. One of these binding sites is involved in the proliferative activity of HOXA7 in breast cancer cells. Together these findings reveal an extraordinary level of interaction flexibility between HOX proteins and their major class of developmental cofactors. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  7. The crystal structure of the Split End protein SHARP adds a new layer of complexity to proteins containing RNA recognition motifs.

    Science.gov (United States)

    Arieti, Fabiana; Gabus, Caroline; Tambalo, Margherita; Huet, Tiphaine; Round, Adam; Thore, Stéphane

    2014-06-01

    The Split Ends (SPEN) protein was originally discovered in Drosophila in the late 1990s. Since then, homologous proteins have been identified in eukaryotic species ranging from plants to humans. Every family member contains three predicted RNA recognition motifs (RRMs) in the N-terminal region of the protein. We have determined the crystal structure of the region of the human SPEN homolog that contains these RRMs-the SMRT/HDAC1 Associated Repressor Protein (SHARP), at 2.0 Å resolution. SHARP is a co-regulator of the nuclear receptors. We demonstrate that two of the three RRMs, namely RRM3 and RRM4, interact via a highly conserved interface. Furthermore, we show that the RRM3-RRM4 block is the main platform mediating the stable association with the H12-H13 substructure found in the steroid receptor RNA activator (SRA), a long, non-coding RNA previously shown to play a crucial role in nuclear receptor transcriptional regulation. We determine that SHARP association with SRA relies on both single- and double-stranded RNA sequences. The crystal structure of the SHARP-RRM fragment, together with the associated RNA-binding studies, extend the repertoire of nucleic acid binding properties of RRM domains suggesting a new hypothesis for a better understanding of SPEN protein functions. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Interleukin-11 binds specific EF-hand proteins via their conserved structural motifs.

    Science.gov (United States)

    Kazakov, Alexei S; Sokolov, Andrei S; Vologzhannikova, Alisa A; Permyakova, Maria E; Khorn, Polina A; Ismailov, Ramis G; Denessiouk, Konstantin A; Denesyuk, Alexander I; Rastrygina, Victoria A; Baksheeva, Viktoriia E; Zernii, Evgeni Yu; Zinchenko, Dmitry V; Glazatov, Vladimir V; Uversky, Vladimir N; Mirzabekov, Tajib A; Permyakov, Eugene A; Permyakov, Sergei E

    2017-01-01

    Interleukin-11 (IL-11) is a hematopoietic cytokine engaged in numerous biological processes and validated as a target for treatment of various cancers. IL-11 contains intrinsically disordered regions that might recognize multiple targets. Recently we found that aside from IL-11RA and gp130 receptors, IL-11 interacts with calcium sensor protein S100P. Strict calcium dependence of this interaction suggests a possibility of IL-11 interaction with other calcium sensor proteins. Here we probed specificity of IL-11 to calcium-binding proteins of various types: calcium sensors of the EF-hand family (calmodulin, S100B and neuronal calcium sensors: recoverin, NCS-1, GCAP-1, GCAP-2), calcium buffers of the EF-hand family (S100G, oncomodulin), and a non-EF-hand calcium buffer (α-lactalbumin). A specific subset of the calcium sensor proteins (calmodulin, S100B, NCS-1, GCAP-1/2) exhibits metal-dependent binding of IL-11 with dissociation constants of 1-19 μM. These proteins share several amino acid residues belonging to conservative structural motifs of the EF-hand proteins, 'black' and 'gray' clusters. Replacements of the respective S100P residues by alanine drastically decrease its affinity to IL-11, suggesting their involvement into the association process. Secondary structure and accessibility of the hinge region of the EF-hand proteins studied are predicted to control specificity and selectivity of their binding to IL-11. The IL-11 interaction with the EF-hand proteins is expected to occur under numerous pathological conditions, accompanied by disintegration of plasma membrane and efflux of cellular components into the extracellular milieu.

  9. Factors Affecting the Binding of a Recombinant Heavy Metal-Binding Domain (CXXC motif Protein to Heavy Metals

    Directory of Open Access Journals (Sweden)

    Kamala Boonyodying

    2012-06-01

    Full Text Available A number of heavy metal-binding proteins have been used to study bioremediation. CXXC motif, a metal binding domain containing Cys-X-X-Cys motif, has been identified in various organisms. These proteins are capable of binding various types of heavy metals. In this study, heavy metal binding domain (CXXC motif recombinant protein encoded from mcsA gene of S. aureus were cloned and overexpressed in Escherichia coli. The factors involved in the metal-binding activity were determined in order to analyze the potential of recombinant protein for bioremediation. A recombinant protein can be bound to Cd2+, Co2+, Cu2+ and Zn2+. The thermal stability of a recombinant protein was tested, and the results showed that the metal binding activity to Cu2+ and Zn2+ still exist after treating the protein at 85ºC for 30 min. The temperature and pH that affected the metal binding activity was tested and the results showed that recombinant protein was still bound to Cu2+ at 65ºC, whereas a pH of 3-7 did not affect the metal binding E. coli harboring a pRset with a heavy metal-binding domain CXXC motif increased the resistance of heavy metals against CuCl2 and CdCl2. This study shows that metal binding domain (CXXC motif recombinant protein can be effectively bound to various types of heavy metals and may be used as a potential tool for studying bioremediation.

  10. An evolutionarily conserved glycine-tyrosine motif forms a folding core in outer membrane proteins.

    Directory of Open Access Journals (Sweden)

    Marcin Michalik

    Full Text Available An intimate interaction between a pair of amino acids, a tyrosine and glycine on neighboring β-strands, has been previously reported to be important for the structural stability of autotransporters. Here, we show that the conservation of this interacting pair extends to nearly all major families of outer membrane β-barrel proteins, which are thought to have originated through duplication events involving an ancestral ββ hairpin. We analyzed the function of this motif using the prototypical outer membrane protein OmpX. Stopped-flow fluorescence shows that two folding processes occur in the millisecond time regime, the rates of which are reduced in the tyrosine mutant. Folding assays further demonstrate a reduction in the yield of folded protein for the mutant compared to the wild-type, as well as a reduction in thermal stability. Taken together, our data support the idea of an evolutionarily conserved 'folding core' that affects the folding, membrane insertion, and thermal stability of outer membrane protein β-barrels.

  11. Crystal Structures of the Scaffolding Protein LGN Reveal the General Mechanism by Which GoLoco Binding Motifs Inhibit the Release of GDP from Gαi *

    Science.gov (United States)

    Jia, Min; Li, Jianchao; Zhu, Jinwei; Wen, Wenyu; Zhang, Mingjie; Wang, Wenning

    2012-01-01

    GoLoco (GL) motif-containing proteins regulate G protein signaling by binding to Gα subunit and acting as guanine nucleotide dissociation inhibitors. GLs of LGN are also known to bind the GDP form of Gαi/o during asymmetric cell division. Here, we show that the C-terminal GL domain of LGN binds four molecules of Gαi·GDP. The crystal structures of Gαi·GDP in complex with LGN GL3 and GL4, respectively, reveal distinct GL/Gαi interaction features when compared with the only high resolution structure known with GL/Gαi interaction between RGS14 and Gαi1. Only a few residues C-terminal to the conserved GL sequence are required for LGN GLs to bind to Gαi·GDP. A highly conserved “double Arg finger” sequence (RΨ(D/E)(D/E)QR) is responsible for LGN GL to bind to GDP bound to Gαi. Together with the sequence alignment, we suggest that the LGN GL/Gαi interaction represents a general binding mode between GL motifs and Gαi. We also show that LGN GLs are potent guanine nucleotide dissociation inhibitors. PMID:22952234

  12. Bound water at protein-protein interfaces: partners, roles and hydrophobic bubbles as a conserved motif.

    Directory of Open Access Journals (Sweden)

    Mostafa H Ahmed

    Full Text Available There is a great interest in understanding and exploiting protein-protein associations as new routes for treating human disease. However, these associations are difficult to structurally characterize or model although the number of X-ray structures for protein-protein complexes is expanding. One feature of these complexes that has received little attention is the role of water molecules in the interfacial region.A data set of 4741 water molecules abstracted from 179 high-resolution (≤ 2.30 Å X-ray crystal structures of protein-protein complexes was analyzed with a suite of modeling tools based on the HINT forcefield and hydrogen-bonding geometry. A metric termed Relevance was used to classify the general roles of the water molecules.The water molecules were found to be involved in: a (bridging interactions with both proteins (21%, b favorable interactions with only one protein (53%, and c no interactions with either protein (26%. This trend is shown to be independent of the crystallographic resolution. Interactions with residue backbones are consistent for all classes and account for 21.5% of all interactions. Interactions with polar residues are significantly more common for the first group and interactions with non-polar residues dominate the last group. Waters interacting with both proteins stabilize on average the proteins' interaction (-0.46 kcal mol(-1, but the overall average contribution of a single water to the protein-protein interaction energy is unfavorable (+0.03 kcal mol(-1. Analysis of the waters without favorable interactions with either protein suggests that this is a conserved phenomenon: 42% of these waters have SASA ≤ 10 Å(2 and are thus largely buried, and 69% of these are within predominantly hydrophobic environments or "hydrophobic bubbles". Such water molecules may have an important biological purpose in mediating protein-protein interactions.

  13. Structural motif screening reveals a novel, conserved carbohydrate-binding surface in the pathogenesis-related protein PR-5d

    Directory of Open Access Journals (Sweden)

    Moffatt Barbara A

    2010-08-01

    Full Text Available Abstract Background Aromatic amino acids play a critical role in protein-glycan interactions. Clusters of surface aromatic residues and their features may therefore be useful in distinguishing glycan-binding sites as well as predicting novel glycan-binding proteins. In this work, a structural bioinformatics approach was used to screen the Protein Data Bank (PDB for coplanar aromatic motifs similar to those found in known glycan-binding proteins. Results The proteins identified in the screen were significantly associated with carbohydrate-related functions according to gene ontology (GO enrichment analysis, and predicted motifs were found frequently within novel folds and glycan-binding sites not included in the training set. In addition to numerous binding sites predicted in structural genomics proteins of unknown function, one novel prediction was a surface motif (W34/W36/W192 in the tobacco pathogenesis-related protein, PR-5d. Phylogenetic analysis revealed that the surface motif is exclusive to a subfamily of PR-5 proteins from the Solanaceae family of plants, and is absent completely in more distant homologs. To confirm PR-5d's insoluble-polysaccharide binding activity, a cellulose-pulldown assay of tobacco proteins was performed and PR-5d was identified in the cellulose-binding fraction by mass spectrometry. Conclusions Based on the combined results, we propose that the putative binding site in PR-5d may be an evolutionary adaptation of Solanaceae plants including potato, tomato, and tobacco, towards defense against cellulose-containing pathogens such as species of the deadly oomycete genus, Phytophthora. More generally, the results demonstrate that coplanar aromatic clusters on protein surfaces are a structural signature of glycan-binding proteins, and can be used to computationally predict novel glycan-binding proteins from 3 D structure.

  14. Structural motif screening reveals a novel, conserved carbohydrate-binding surface in the pathogenesis-related protein PR-5d.

    Science.gov (United States)

    Doxey, Andrew C; Cheng, Zhenyu; Moffatt, Barbara A; McConkey, Brendan J

    2010-08-03

    Aromatic amino acids play a critical role in protein-glycan interactions. Clusters of surface aromatic residues and their features may therefore be useful in distinguishing glycan-binding sites as well as predicting novel glycan-binding proteins. In this work, a structural bioinformatics approach was used to screen the Protein Data Bank (PDB) for coplanar aromatic motifs similar to those found in known glycan-binding proteins. The proteins identified in the screen were significantly associated with carbohydrate-related functions according to gene ontology (GO) enrichment analysis, and predicted motifs were found frequently within novel folds and glycan-binding sites not included in the training set. In addition to numerous binding sites predicted in structural genomics proteins of unknown function, one novel prediction was a surface motif (W34/W36/W192) in the tobacco pathogenesis-related protein, PR-5d. Phylogenetic analysis revealed that the surface motif is exclusive to a subfamily of PR-5 proteins from the Solanaceae family of plants, and is absent completely in more distant homologs. To confirm PR-5d's insoluble-polysaccharide binding activity, a cellulose-pulldown assay of tobacco proteins was performed and PR-5d was identified in the cellulose-binding fraction by mass spectrometry. Based on the combined results, we propose that the putative binding site in PR-5d may be an evolutionary adaptation of Solanaceae plants including potato, tomato, and tobacco, towards defense against cellulose-containing pathogens such as species of the deadly oomycete genus, Phytophthora. More generally, the results demonstrate that coplanar aromatic clusters on protein surfaces are a structural signature of glycan-binding proteins, and can be used to computationally predict novel glycan-binding proteins from 3 D structure.

  15. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-01

    LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  16. Microbial expression of proteins containing long repetitive Arg-Gly-Asp cell adhesive motifs created by overlap elongation PCR

    International Nuclear Information System (INIS)

    Kurihara, Hiroyuki; Shinkai, Masashige; Nagamune, Teruyuki

    2004-01-01

    We developed a novel method for creating repetitive DNA libraries using overlap elongation PCR, and prepared a DNA library encoding repetitive Arg-Gly-Asp (RGD) cell adhesive motifs. We obtained various length DNAs encoding repetitive RGD from a short monomer DNA (18 bp) after a thermal cyclic reaction without a DNA template for amplification, and isolated DNAs encoding 2, 21, and 43 repeats of the RGD motif. We cloned these DNAs into a protein expression vector and overexpressed them as thioredoxin fusion proteins: RGD2, RGD21, and RGD43, respectively. The solubility of RGD43 in water was low and it formed a fibrous precipitate in water. Scanning electron microscopy revealed that RGD43 formed a branched 3D-network structure in the solid state. To evaluate the function of the cell adhesive motifs in RGD43, mouse fibroblast cells were cultivated on the RGD43 scaffold. The fibroblast cells adhered to the RGD43 scaffold and extended long filopodia

  17. MicroRNA sequence motifs reveal asymmetry between the stem arms

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Havgaard, Jakob Hull; Ensterö, M.

    2006-01-01

    The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature miRNAs in their gen......The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature mi...

  18. Structural and functional analysis of VQ motif-containing proteins in Arabidopsis as interacting proteins of WRKY transcription factors.

    Science.gov (United States)

    Cheng, Yuan; Zhou, Yuan; Yang, Yan; Chi, Ying-Jun; Zhou, Jie; Chen, Jian-Ye; Wang, Fei; Fan, Baofang; Shi, Kai; Zhou, Yan-Hong; Yu, Jing-Quan; Chen, Zhixiang

    2012-06-01

    WRKY transcription factors are encoded by a large gene superfamily with a broad range of roles in plants. Recently, several groups have reported that proteins containing a short VQ (FxxxVQxLTG) motif interact with WRKY proteins. We have recently discovered that two VQ proteins from Arabidopsis (Arabidopsis thaliana), SIGMA FACTOR-INTERACTING PROTEIN1 and SIGMA FACTOR-INTERACTING PROTEIN2, act as coactivators of WRKY33 in plant defense by specifically recognizing the C-terminal WRKY domain and stimulating the DNA-binding activity of WRKY33. In this study, we have analyzed the entire family of 34 structurally divergent VQ proteins from Arabidopsis. Yeast (Saccharomyces cerevisiae) two-hybrid assays showed that Arabidopsis VQ proteins interacted specifically with the C-terminal WRKY domains of group I and the sole WRKY domains of group IIc WRKY proteins. Using site-directed mutagenesis, we identified structural features of these two closely related groups of WRKY domains that are critical for interaction with VQ proteins. Quantitative reverse transcription polymerase chain reaction revealed that expression of a majority of Arabidopsis VQ genes was responsive to pathogen infection and salicylic acid treatment. Functional analysis using both knockout mutants and overexpression lines revealed strong phenotypes in growth, development, and susceptibility to pathogen infection. Altered phenotypes were substantially enhanced through cooverexpression of genes encoding interacting VQ and WRKY proteins. These findings indicate that VQ proteins play an important role in plant growth, development, and response to environmental conditions, most likely by acting as cofactors of group I and IIc WRKY transcription factors.

  19. The ubiquitin ligase tripartite-motif-protein 32 is induced in Duchenne muscular dystrophy.

    Science.gov (United States)

    Assereto, Stefania; Piccirillo, Rosanna; Baratto, Serena; Scudieri, Paolo; Fiorillo, Chiara; Massacesi, Manuela; Traverso, Monica; Galietta, Luis J; Bruno, Claudio; Minetti, Carlo; Zara, Federico; Gazzerro, Elisabetta

    2016-08-01

    Activation of the proteasome pathway is one of the secondary processes of cell damage, which ultimately lead to muscle degeneration and necrosis in Duchenne muscular dystrophy (DMD). In mdx mice, the proteasome inhibitor bortezomib up-regulates the membrane expression of members of the dystrophin complex and reduces the inflammatory reaction. However, chronic inhibition of the 26S proteasome may be toxic, as indicated by the systemic side-effects caused by this drug. Therefore, we sought to determine the components of the ubiquitin-proteasome pathway that are specifically activated in human dystrophin-deficient muscles. The analysis of a cohort of patients with genetically determined DMD or Becker muscular dystrophy (BMD) unveiled a selective up-regulation of the ubiquitin ligase tripartite motif-containing protein 32 (TRIM32). The induction of TRIM32 was due to a transcriptional effect and it correlated with disease severity in BMD patients. In contrast, atrogin1 and muscle RING-finger protein-1 (MuRF-1), which are strongly increased in distinct types of muscular atrophy, were not affected by the DMD dystrophic process. Knock-out models showed that TRIM32 is involved in ubiquitination of muscle cytoskeletal proteins as well as of protein inhibitor of activated STAT protein gamma (Piasγ) and N-myc downstream-regulated gene, two inhibitors of satellite cell proliferation and differentiation. Accordingly, we showed that in DMD/BMD muscle tissue, TRIM32 induction was more pronounced in regenerating myofibers rather than in necrotic muscle cells, thus pointing out a role of this protein in the regulation of human myoblast cell fate. This finding highlights TRIM32 as a possible therapeutic target to favor skeletal muscle regeneration in DMD patients.

  20. Structure of Rhodococcus equi virulence-associated protein B (VapB) reveals an eight-stranded antiparallel β-barrel consisting of two Greek-key motifs

    International Nuclear Information System (INIS)

    Geerds, Christina; Wohlmann, Jens; Haas, Albert; Niemann, Hartmut H.

    2014-01-01

    The structure of VapB, a member of the Vap protein family that is involved in virulence of the bacterial pathogen R. equi, was determined by SAD phasing and reveals an eight-stranded antiparallel β-barrel similar to avidin, suggestive of a binding function. Made up of two Greek-key motifs, the topology of VapB is unusual or even unique. Members of the virulence-associated protein (Vap) family from the pathogen Rhodococcus equi regulate virulence in an unknown manner. They do not share recognizable sequence homology with any protein of known structure. VapB and VapA are normally associated with isolates from pigs and horses, respectively. To contribute to a molecular understanding of Vap function, the crystal structure of a protease-resistant VapB fragment was determined at 1.4 Å resolution. The structure was solved by SAD phasing employing the anomalous signal of one endogenous S atom and two bound Co ions with low occupancy. VapB is an eight-stranded antiparallel β-barrel with a single helix. Structural similarity to avidins suggests a potential binding function. Unlike other eight- or ten-stranded β-barrels found in avidins, bacterial outer membrane proteins, fatty-acid-binding proteins and lysozyme inhibitors, Vaps do not have a next-neighbour arrangement but consist of two Greek-key motifs with strand order 41238567, suggesting an unusual or even unique topology

  1. Structure of Rhodococcus equi virulence-associated protein B (VapB) reveals an eight-stranded antiparallel β-barrel consisting of two Greek-key motifs

    Energy Technology Data Exchange (ETDEWEB)

    Geerds, Christina [Bielefeld University, Universitaetsstrasse 25, 33615 Bielefeld (Germany); Wohlmann, Jens; Haas, Albert [University of Bonn, Ulrich-Haberland Strasse 61a, 53121 Bonn (Germany); Niemann, Hartmut H., E-mail: hartmut.niemann@uni-bielefeld.de [Bielefeld University, Universitaetsstrasse 25, 33615 Bielefeld (Germany)

    2014-06-18

    The structure of VapB, a member of the Vap protein family that is involved in virulence of the bacterial pathogen R. equi, was determined by SAD phasing and reveals an eight-stranded antiparallel β-barrel similar to avidin, suggestive of a binding function. Made up of two Greek-key motifs, the topology of VapB is unusual or even unique. Members of the virulence-associated protein (Vap) family from the pathogen Rhodococcus equi regulate virulence in an unknown manner. They do not share recognizable sequence homology with any protein of known structure. VapB and VapA are normally associated with isolates from pigs and horses, respectively. To contribute to a molecular understanding of Vap function, the crystal structure of a protease-resistant VapB fragment was determined at 1.4 Å resolution. The structure was solved by SAD phasing employing the anomalous signal of one endogenous S atom and two bound Co ions with low occupancy. VapB is an eight-stranded antiparallel β-barrel with a single helix. Structural similarity to avidins suggests a potential binding function. Unlike other eight- or ten-stranded β-barrels found in avidins, bacterial outer membrane proteins, fatty-acid-binding proteins and lysozyme inhibitors, Vaps do not have a next-neighbour arrangement but consist of two Greek-key motifs with strand order 41238567, suggesting an unusual or even unique topology.

  2. A systematic evaluation of protein kinase a-a-kinase anchoring protein interaction motifs

    NARCIS (Netherlands)

    Burgers, Pepijn P|info:eu-repo/dai/nl/341566551; van der Heyden, Marcel A G; Kok, Bart; Heck, Albert J R|info:eu-repo/dai/nl/105189332; Scholten, Arjen|info:eu-repo/dai/nl/313939780

    2015-01-01

    Protein kinase A (PKA) in vertebrates is localized to specific locations in the cell via A-kinase anchoring proteins (AKAPs). The regulatory subunits of the four PKA isoforms (RIα, RIβ, RIIα, and RIIβ) each form a homodimer, and their dimerization domain interacts with a small helical region present

  3. A systematic evaluation of protein kinase A-A-kinase anchoring protein interaction motifs

    NARCIS (Netherlands)

    Burgers, Pepijn P; van der Heyden, MAG; Kok, Bart; Heck, Albert J R; Scholten, Arjen

    2015-01-01

    Protein kinase A (PKA) in vertebrates is localized to specific locations in the cell via A-kinase anchoring proteins (AKAPs). The regulatory subunits of the four PKA isoforms (RIα, RIβ, RIIα, and RIIβ) each form a homodimer, and their dimerization domain interacts with a small helical region present

  4. Amyloid fibril formation from sequences of a natural beta-structured fibrous protein, the adenovirus fiber.

    Science.gov (United States)

    Papanikolopoulou, Katerina; Schoehn, Guy; Forge, Vincent; Forsyth, V Trevor; Riekel, Christian; Hernandez, Jean-François; Ruigrok, Rob W H; Mitraki, Anna

    2005-01-28

    Amyloid fibrils are fibrous beta-structures that derive from abnormal folding and assembly of peptides and proteins. Despite a wealth of structural studies on amyloids, the nature of the amyloid structure remains elusive; possible connections to natural, beta-structured fibrous motifs have been suggested. In this work we focus on understanding amyloid structure and formation from sequences of a natural, beta-structured fibrous protein. We show that short peptides (25 to 6 amino acids) corresponding to repetitive sequences from the adenovirus fiber shaft have an intrinsic capacity to form amyloid fibrils as judged by electron microscopy, Congo Red binding, infrared spectroscopy, and x-ray fiber diffraction. In the presence of the globular C-terminal domain of the protein that acts as a trimerization motif, the shaft sequences adopt a triple-stranded, beta-fibrous motif. We discuss the possible structure and arrangement of these sequences within the amyloid fibril, as compared with the one adopted within the native structure. A 6-amino acid peptide, corresponding to the last beta-strand of the shaft, was found to be sufficient to form amyloid fibrils. Structural analysis of these amyloid fibrils suggests that perpendicular stacking of beta-strand repeat units is an underlying common feature of amyloid formation.

  5. A rare polyglycine type II-like helix motif in naturally occurring proteins.

    Science.gov (United States)

    Warkentin, Eberhard; Weidenweber, Sina; Schühle, Karola; Demmer, Ulrike; Heider, Johann; Ermler, Ulrich

    2017-11-01

    Common structural elements in proteins such as α-helices or β-sheets are characterized by uniformly repeating, energetically favorable main chain conformations which additionally exhibit a completely saturated hydrogen-bonding network of the main chain NH and CO groups. Although polyproline or polyglycine type II helices (PP II or PG II ) are frequently found in proteins, they are not considered as equivalent secondary structure elements because they do not form a similar self-contained hydrogen-bonding network of the main chain atoms. In this context our finding of an unusual motif of glycine-rich PG II -like helices in the structure of the acetophenone carboxylase core complex is of relevance. These PG II -like helices form hexagonal bundles which appear to fulfill the criterion of a (largely) saturated hydrogen-bonding network of the main-chain groups and therefore may be regarded in this sense as a new secondary structure element. It consists of a central PG II -like helix surrounded by six nearly parallel PG II -like helices in a hexagonal array, plus an additional PG II -like helix extending the array outwards. Very related structural elements have previously been found in synthetic polyglycine fibers. In both cases, all main chain NH and CO groups of the central PG II -helix are saturated by either intra- or intermolecular hydrogen-bonds, resulting in a self-contained hydrogen-bonding network. Similar, but incomplete PG II -helix patterns were also previously identified in a GTP-binding protein and an antifreeze protein. © 2017 Wiley Periodicals, Inc.

  6. Deciphering functional glycosaminoglycan motifs in development.

    Science.gov (United States)

    Townley, Robert A; Bülow, Hannes E

    2018-03-23

    Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Armadillo motifs involved in vesicular transport.

    Directory of Open Access Journals (Sweden)

    Harald Striegl

    Full Text Available Armadillo (ARM repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family.

  8. Noroviruses Co-opt the Function of Host Proteins VAPA and VAPB for Replication via a Phenylalanine-Phenylalanine-Acidic-Tract-Motif Mimic in Nonstructural Viral Protein NS1/2.

    Science.gov (United States)

    McCune, Broc T; Tang, Wei; Lu, Jia; Eaglesham, James B; Thorne, Lucy; Mayer, Anne E; Condiff, Emily; Nice, Timothy J; Goodfellow, Ian; Krezel, Andrzej M; Virgin, Herbert W

    2017-07-11

    VAPA host protein. The NS1/2-VAPA interaction is conserved between murine and human noroviruses and was important for early steps in murine norovirus replication. Using structure-function analysis, we found that NS1/2 contains a short sequence that molecularly mimics the FFAT motif that is found in multiple host proteins that bind VAPA. This represents to our knowledge the first example of functionally important mimicry of a host FFAT motif by a microbial protein. Copyright © 2017 McCune et al.

  9. Extreme sequence divergence but conserved ligand-binding specificity in Streptococcus pyogenes M protein.

    Directory of Open Access Journals (Sweden)

    2006-05-01

    Full Text Available Many pathogenic microorganisms evade host immunity through extensive sequence variability in a protein region targeted by protective antibodies. In spite of the sequence variability, a variable region commonly retains an important ligand-binding function, reflected in the presence of a highly conserved sequence motif. Here, we analyze the limits of sequence divergence in a ligand-binding region by characterizing the hypervariable region (HVR of Streptococcus pyogenes M protein. Our studies were focused on HVRs that bind the human complement regulator C4b-binding protein (C4BP, a ligand that confers phagocytosis resistance. A previous comparison of C4BP-binding HVRs identified residue identities that could be part of a binding motif, but the extended analysis reported here shows that no residue identities remain when additional C4BP-binding HVRs are included. Characterization of the HVR in the M22 protein indicated that two relatively conserved Leu residues are essential for C4BP binding, but these residues are probably core residues in a coiled-coil, implying that they do not directly contribute to binding. In contrast, substitution of either of two relatively conserved Glu residues, predicted to be solvent-exposed, had no effect on C4BP binding, although each of these changes had a major effect on the antigenic properties of the HVR. Together, these findings show that HVRs of M proteins have an extraordinary capacity for sequence divergence and antigenic variability while retaining a specific ligand-binding function.

  10. Expression, purification and characterization of hepatitis B virus X protein BH3-like motif-linker-Bcl-xL fusion protein for structural studies

    Directory of Open Access Journals (Sweden)

    Hideki Kusunoki

    2017-03-01

    Full Text Available Hepatitis B virus X protein (HBx is a multifunctional protein that interacts directly with many host proteins. For example, HBx interacts with anti-apoptotic proteins, Bcl-2 and Bcl-xL, through its BH3-like motif, which leads to elevated cytosolic calcium levels, efficient viral DNA replication and the induction of apoptosis. To facilitate sample preparation and perform detailed structural characterization of the complex between HBx and Bcl-xL, we designed and purified a recombinant HBx BH3-like motif-linker-Bcl-xL fusion protein produced in E. coli. The fusion protein was characterized by size exclusion chromatography, circular dichroism and nuclear magnetic resonance experiments. Our results show that the fusion protein is a monomer in aqueous solution, forms a stable intramolecular complex, and likely retains the native conformation of the complex between Bcl-xL and the HBx BH3-like motif. Furthermore, the HBx BH3-like motif of the intramolecular complex forms an α-helix. These observations indicate that the fusion protein should facilitate structural studies aimed at understanding the interaction between HBx and Bcl-xL at the atomic level.

  11. Natural HLA-B*2705 Protein Ligands with Glutamine as Anchor Motif

    Science.gov (United States)

    Infantes, Susana; Lorente, Elena; Barnea, Eilon; Beer, Ilan; Barriga, Alejandro; Lasala, Fátima; Jiménez, Mercedes; Admon, Arie; López, Daniel

    2013-01-01

    The presentation of short viral peptide antigens by human leukocyte antigen (HLA) class I molecules on cell surfaces is a key step in the activation of cytotoxic T lymphocytes, which mediate the killing of pathogen-infected cells or initiate autoimmune tissue damage. HLA-B27 is a well known class I molecule that is used to study both facets of the cellular immune response. Using mass spectrometry analysis of complex HLA-bound peptide pools isolated from large amounts of HLA-B*2705+ cells, we identified 200 naturally processed HLA-B*2705 ligands. Our analyses revealed that a change in the position (P) 2 anchor motif was detected in the 3% of HLA-B*2705 ligands identified. B*2705 class I molecules were able to bind these six GlnP2 peptides, which showed significant homology to pathogenic bacterial sequences, with a broad range of affinities. One of these ligands was able to bind with distinct conformations to HLA-B27 subtypes differentially associated with ankylosing spondylitis. These conformational differences could be sufficient to initiate autoimmune damage in patients with ankylosing spondylitis-associated subtypes. Therefore, these kinds of peptides (short, with GlnP2, and similar low affinity to all HLA-B27 subtypes tested but with unlike conformations in differentially ankylosing spondylitis-associated subtypes) must not be excluded from future researches involving potential arthritogenic peptides. PMID:23430249

  12. Identification of amino acid residues in protein SRP72 required for binding to a kinked 5e motif of the human signal recognition particle RNA

    Directory of Open Access Journals (Sweden)

    Zwieb Christian

    2010-11-01

    Full Text Available Abstract Background Human cells depend critically on the signal recognition particle (SRP for the sorting and delivery of their proteins. The SRP is a ribonucleoprotein complex which binds to signal sequences of secretory polypeptides as they emerge from the ribosome. Among the six proteins of the eukaryotic SRP, the largest protein, SRP72, is essential for protein targeting and possesses a poorly characterized RNA binding domain. Results We delineated the minimal region of SRP72 capable of forming a stable complex with an SRP RNA fragment. The region encompassed residues 545 to 585 of the full-length human SRP72 and contained a lysine-rich cluster (KKKKKKKKGK at postions 552 to 561 as well as a conserved Pfam motif with the sequence PDPXRWLPXXER at positions 572 to 583. We demonstrated by site-directed mutagenesis that both regions participated in the formation of a complex with the RNA. In agreement with biochemical data and results from chymotryptic digestion experiments, molecular modeling of SRP72 implied that the invariant W577 was located inside the predicted structure of an RNA binding domain. The 11-nucleotide 5e motif contained within the SRP RNA fragment was shown by comparative electrophoresis on native polyacrylamide gels to conform to an RNA kink-turn. The model of the complex suggested that the conserved A240 of the K-turn, previously identified as being essential for the binding to SRP72, could protrude into a groove of the SRP72 RNA binding domain, similar but not identical to how other K-turn recognizing proteins interact with RNA. Conclusions The results from the presented experiments provided insights into the molecular details of a functionally important and structurally interesting RNA-protein interaction. A model for how a ligand binding pocket of SRP72 can accommodate a new RNA K-turn in the 5e region of the eukaryotic SRP RNA is proposed.

  13. Identification of amino acid residues in protein SRP72 required for binding to a kinked 5e motif of the human signal recognition particle RNA.

    Science.gov (United States)

    Iakhiaeva, Elena; Iakhiaev, Alexei; Zwieb, Christian

    2010-11-13

    Human cells depend critically on the signal recognition particle (SRP) for the sorting and delivery of their proteins. The SRP is a ribonucleoprotein complex which binds to signal sequences of secretory polypeptides as they emerge from the ribosome. Among the six proteins of the eukaryotic SRP, the largest protein, SRP72, is essential for protein targeting and possesses a poorly characterized RNA binding domain. We delineated the minimal region of SRP72 capable of forming a stable complex with an SRP RNA fragment. The region encompassed residues 545 to 585 of the full-length human SRP72 and contained a lysine-rich cluster (KKKKKKKKGK) at postions 552 to 561 as well as a conserved Pfam motif with the sequence PDPXRWLPXXER at positions 572 to 583. We demonstrated by site-directed mutagenesis that both regions participated in the formation of a complex with the RNA. In agreement with biochemical data and results from chymotryptic digestion experiments, molecular modeling of SRP72 implied that the invariant W577 was located inside the predicted structure of an RNA binding domain. The 11-nucleotide 5e motif contained within the SRP RNA fragment was shown by comparative electrophoresis on native polyacrylamide gels to conform to an RNA kink-turn. The model of the complex suggested that the conserved A240 of the K-turn, previously identified as being essential for the binding to SRP72, could protrude into a groove of the SRP72 RNA binding domain, similar but not identical to how other K-turn recognizing proteins interact with RNA. The results from the presented experiments provided insights into the molecular details of a functionally important and structurally interesting RNA-protein interaction. A model for how a ligand binding pocket of SRP72 can accommodate a new RNA K-turn in the 5e region of the eukaryotic SRP RNA is proposed.

  14. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

    Science.gov (United States)

    Gilbert, N; Labuda, D

    1999-03-16

    A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.

  15. Identification of the divergent calmodulin binding motif in yeast Ssb1/Hsp75 protein and in other HSP70 family members.

    Science.gov (United States)

    Heinen, R C; Diniz-Mendes, L; Silva, J T; Paschoalin, V M F

    2006-11-01

    Yeast soluble proteins were fractionated by calmodulin-agarose affinity chromatography and the Ca2+/calmodulin-binding proteins were analyzed by SDS-PAGE. One prominent protein of 66 kDa was excised from the gel, digested with trypsin and the masses of the resultant fragments were determined by MALDI/MS. Twenty-one of 38 monoisotopic peptide masses obtained after tryptic digestion were matched to the heat shock protein Ssb1/Hsp75, covering 37% of its sequence. Computational analysis of the primary structure of Ssb1/Hsp75 identified a unique potential amphipathic alpha-helix in its N-terminal ATPase domain with features of target regions for Ca2+/calmodulin binding. This region, which shares 89% similarity to the experimentally determined calmodulin-binding domain from mouse, Hsc70, is conserved in near half of the 113 members of the HSP70 family investigated, from yeast to plant and animals. Based on the sequence of this region, phylogenetic analysis grouped the HSP70s in three distinct branches. Two of them comprise the non-calmodulin binding Hsp70s BIP/GR78, a subfamily of eukaryotic HSP70 localized in the endoplasmic reticulum, and DnaK, a subfamily of prokaryotic HSP70. A third heterogeneous group is formed by eukaryotic cytosolic HSP70s containing the new calmodulin-binding motif and other cytosolic HSP70s whose sequences do not conform to those conserved motif, indicating that not all eukaryotic cytosolic Hsp70s are target for calmodulin regulation. Furthermore, the calmodulin-binding domain found in eukaryotic HSP70s is also the target for binding of Bag-1 - an enhancer of ADP/ATP exchange activity of Hsp70s. A model in which calmodulin displaces Bag-1 and modulates Ssb1/Hsp75 chaperone activity is discussed.

  16. Identification of the divergent calmodulin binding motif in yeast Ssb1/Hsp75 protein and in other HSP70 family members

    Directory of Open Access Journals (Sweden)

    R.C. Heinen

    2006-11-01

    Full Text Available Yeast soluble proteins were fractionated by calmodulin-agarose affinity chromatography and the Ca2+/calmodulin-binding proteins were analyzed by SDS-PAGE. One prominent protein of 66 kDa was excised from the gel, digested with trypsin and the masses of the resultant fragments were determined by MALDI/MS. Twenty-one of 38 monoisotopic peptide masses obtained after tryptic digestion were matched to the heat shock protein Ssb1/Hsp75, covering 37% of its sequence. Computational analysis of the primary structure of Ssb1/Hsp75 identified a unique potential amphipathic alpha-helix in its N-terminal ATPase domain with features of target regions for Ca2+/calmodulin binding. This region, which shares 89% similarity to the experimentally determined calmodulin-binding domain from mouse, Hsc70, is conserved in near half of the 113 members of the HSP70 family investigated, from yeast to plant and animals. Based on the sequence of this region, phylogenetic analysis grouped the HSP70s in three distinct branches. Two of them comprise the non-calmodulin binding Hsp70s BIP/GR78, a subfamily of eukaryotic HSP70 localized in the endoplasmic reticulum, and DnaK, a subfamily of prokaryotic HSP70. A third heterogeneous group is formed by eukaryotic cytosolic HSP70s containing the new calmodulin-binding motif and other cytosolic HSP70s whose sequences do not conform to those conserved motif, indicating that not all eukaryotic cytosolic Hsp70s are target for calmodulin regulation. Furthermore, the calmodulin-binding domain found in eukaryotic HSP70s is also the target for binding of Bag-1 - an enhancer of ADP/ATP exchange activity of Hsp70s. A model in which calmodulin displaces Bag-1 and modulates Ssb1/Hsp75 chaperone activity is discussed.

  17. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2007-02-01

    Full Text Available Abstract Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

  18. A CACGTG motif of the Antirrhinum majus chalcone synthase promoter is recognized by an evolutionarily conserved nuclear protein

    International Nuclear Information System (INIS)

    Staiger, D.; Kaulen, H.; Schell, J.

    1989-01-01

    In the chalcone synthase gene of Antirrhinum majus (snapdragon), 150 base pairs of the 5' flanking region contain cis-acting signals for UV light-induced expression. A nuclear factor, designated CG-1, specifically recognizes a hexameric motif with internal dyad symmetry, CACGTG, located within this light-responsive sequence. Binding of CG-1 is influenced by C-methylation of the CpG dinucleotide in the recognition sequence. CG-1 is a factor found in a variety of dicotyledonous plant species including Nicotiana tabacum, A. majus, Petunia hybrida, Arabidopsis thaliana, and Glycine max. CACGTG motifs contained within trans-acting factor recognition sites in various other plant promoters can interact with CG-1. In addition, the binding site of the human adenovirus major late transcription factor USF can compete for CG-1 binding to the chalcone synthase promoter. This suggests an evolutionary conservation of trans-acting factor recognition sites involved in divergent mechanisms of gene control. (author)

  19. PDILT, a divergent testis-specific protein disulfide isomerase with a non-classical SXXC motif that engages in disulfide-dependent interactions in the endoplasmic reticulum.

    Science.gov (United States)

    van Lith, Marcel; Hartigan, Nichola; Hatch, Jennifer; Benham, Adam M

    2005-01-14

    Protein disulfide isomerase (PDI) is the archetypal enzyme involved in the formation and reshuffling of disulfide bonds in the endoplasmic reticulum (ER). PDI achieves its redox function through two highly conserved thioredoxin domains, and PDI can also operate as an ER chaperone. The substrate specificities and the exact functions of most other PDI family proteins remain important unsolved questions in biology. Here, we characterize a new and striking member of the PDI family, which we have named protein disulfide isomerase-like protein of the testis (PDILT). PDILT is the first eukaryotic SXXC protein to be characterized in the ER. Our experiments have unveiled a novel, glycosylated PDI-like protein whose tissue-specific expression and unusual motifs have implications for the evolution, catalytic function, and substrate selection of thioredoxin family proteins. We show that PDILT is an ER resident glycoprotein that liaises with partner proteins in disulfide-dependent complexes within the testis. PDILT interacts with the oxidoreductase Ero1alpha, demonstrating that the N-terminal cysteine of the CXXC sequence is not required for binding of PDI family proteins to ER oxidoreductases. The expression of PDILT, in addition to PDI in the testis, suggests that PDILT performs a specialized chaperone function in testicular cells. PDILT is an unusual PDI relative that highlights the adaptability of chaperone and redox function in enzymes of the endoplasmic reticulum.

  20. Sequence and structural analysis of the chitinase insertion domain reveals two conserved motifs involved in chitin-binding.

    Directory of Open Access Journals (Sweden)

    Hai Li

    2010-01-01

    Full Text Available Chitinases are prevalent in life and are found in species including archaea, bacteria, fungi, plants, and animals. They break down chitin, which is the second most abundant carbohydrate in nature after cellulose. Hence, they are important for maintaining a balance between carbon and nitrogen trapped as insoluble chitin in biomass. Chitinases are classified into two families, 18 and 19 glycoside hydrolases. In addition to a catalytic domain, which is a triosephosphate isomerase barrel, many family 18 chitinases contain another module, i.e., chitinase insertion domain. While numerous studies focus on the biological role of the catalytic domain in chitinase activity, the function of the chitinase insertion domain is not completely understood. Bioinformatics offers an important avenue in which to facilitate understanding the role of residues within the chitinase insertion domain in chitinase function.Twenty-seven chitinase insertion domain sequences, which include four experimentally determined structures and span five kingdoms, were aligned and analyzed using a modified sequence entropy parameter. Thirty-two positions with conserved residues were identified. The role of these conserved residues was explored by conducting a structural analysis of a number of holo-enzymes. Hydrogen bonding and van der Waals calculations revealed a distinct subset of four conserved residues constituting two sequence motifs that interact with oligosaccharides. The other conserved residues may be key to the structure, folding, and stability of this domain.Sequence and structural studies of the chitinase insertion domains conducted within the framework of evolution identified four conserved residues which clearly interact with the substrates. Furthermore, evolutionary studies propose a link between the appearance of the chitinase insertion domain and the function of family 18 chitinases in the subfamily A.

  1. In silico characterization of boron transporter (BOR1 protein sequences in Poaceae species

    Directory of Open Access Journals (Sweden)

    Ertuğrul Filiz

    2013-01-01

    Full Text Available Boron (B is essential for the plant growth and development, and its primary function is connected with formation of the cell wall. Moreover, boron toxicity is a shared problem in semiarid and arid regions. In this study, boron transporter protein (BOR1 sequences from some Poaceae species (Hordeum vulgare subsp. vulgare, Zea mays, Brachypodium distachyon, Oryza sativa subsp. japonica, Oryza sativa subsp. indica, Sorghum bicolor, Triticum aestivum were evaluated by bioinformatics tools. Physicochemical analyses revealed that most of BOR1 proteins were basic character and had generally aliphatic amino acids. Analysis of the domains showed that transmembrane domains were identified constantly and three motifs were detected with 50 amino acids length. Also, the motif SPNPWEPGSYDHWTVAKDMFNVPPAYIFGAFIPATMVAGLYYFDHSVASQ was found most frequently with 25 repeats. The phylogenetic tree showed divergence into two main clusters. B. distachyon species were clustered separately. Finally, this study contributes to the new BOR1 protein characterization in grasses and create scientific base for in silico analysis in future.

  2. A systems wide mass spectrometric based linear motif screen to identify dominant in-vivo interacting proteins for the ubiquitin ligase MDM2.

    Science.gov (United States)

    Nicholson, Judith; Scherl, Alex; Way, Luke; Blackburn, Elizabeth A; Walkinshaw, Malcolm D; Ball, Kathryn L; Hupp, Ted R

    2014-06-01

    Linear motifs mediate protein-protein interactions (PPI) that allow expansion of a target protein interactome at a systems level. This study uses a proteomics approach and linear motif sub-stratifications to expand on PPIs of MDM2. MDM2 is a multi-functional protein with over one hundred known binding partners not stratified by hierarchy or function. A new linear motif based on a MDM2 interaction consensus is used to select novel MDM2 interactors based on Nutlin-3 responsiveness in a cell-based proteomics screen. MDM2 binds a subset of peptide motifs corresponding to real proteins with a range of allosteric responses to MDM2 ligands. We validate cyclophilin B as a novel protein with a consensus MDM2 binding motif that is stabilised by Nutlin-3 in vivo, thus identifying one of the few known interactors of MDM2 that is stabilised by Nutlin-3. These data invoke two modes of peptide binding at the MDM2 N-terminus that rely on a consensus core motif to control the equilibrium between MDM2 binding proteins. This approach stratifies MDM2 interacting proteins based on the linear motif feature and provides a new biomarker assay to define clinically relevant Nutlin-3 responsive MDM2 interactors. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. The Arabidopsis GAGA-Binding Factor BASIC PENTACYSTEINE6 Recruits the POLYCOMB-REPRESSIVE COMPLEX1 Component LIKE HETEROCHROMATIN PROTEIN1 to GAGA DNA Motifs.

    Science.gov (United States)

    Hecker, Andreas; Brand, Luise H; Peter, Sébastien; Simoncello, Nathalie; Kilian, Joachim; Harter, Klaus; Gaudin, Valérie; Wanke, Dierk

    2015-07-01

    Polycomb-repressive complexes (PRCs) play key roles in development by repressing a large number of genes involved in various functions. Much, however, remains to be discovered about PRC-silencing mechanisms as well as their targeting to specific genomic regions. Besides other mechanisms, GAGA-binding factors in animals can guide PRC members in a sequence-specific manner to Polycomb-responsive DNA elements. Here, we show that the Arabidopsis (Arabidopsis thaliana) GAGA-motif binding factor protein basic pentacysteine6 (BPC6) interacts with like heterochromatin protein1 (LHP1), a PRC1 component, and associates with vernalization2 (VRN2), a PRC2 component, in vivo. By using a modified DNA-protein interaction enzyme-linked immunosorbant assay, we could show that BPC6 was required and sufficient to recruit LHP1 to GAGA motif-containing DNA probes in vitro. We also found that LHP1 interacts with VRN2 and, therefore, can function as a possible scaffold between BPC6 and VRN2. The lhp1-4 bpc4 bpc6 triple mutant displayed a pleiotropic phenotype, extreme dwarfism and early flowering, which disclosed synergistic functions of LHP1 and group II plant BPC members. Transcriptome analyses supported this synergy and suggested a possible function in the concerted repression of homeotic genes, probably through histone H3 lysine-27 trimethylation. Hence, our findings suggest striking similarities between animal and plant GAGA-binding factors in the recruitment of PRC1 and PRC2 components to Polycomb-responsive DNA element-like GAGA motifs, which must have evolved through convergent evolution. © 2015 American Society of Plant Biologists. All Rights Reserved.

  4. Transcriptional control of the tissue-specific, developmentally regulated osteocalcin gene requires a binding motif for the Msx family of homeodomain proteins.

    Science.gov (United States)

    Hoffmann, H M; Catron, K M; van Wijnen, A J; McCabe, L R; Lian, J B; Stein, G S; Stein, J L

    1994-12-20

    The OC box of the rat osteocalcin promoter (nt -99 to -76) is the principal proximal regulatory element contributing to both tissue-specific and developmental control of osteocalcin gene expression. The central motif of the OC box includes a perfect consensus DNA binding site for certain homeodomain proteins. Homeodomain proteins are transcription factors that direct proper development by regulating specific temporal and spatial patterns of gene expression. We therefore addressed the role of the homeodomain binding motif in the activity of the OC promoter. In this study, by the combined application of mutagenesis and site-specific protein recognition analysis, we examined interactions of ROS 17/2.8 osteosarcoma cell nuclear proteins and purified Msx-1 homeodomain protein with the OC box. We detected a series of related specific protein-DNA interactions, a subset of which were inhibited by antibodies directed against the Msx-1 homeodomain but which also recognize the Msx-2 homeodomain. Our results show that the sequence requirements for binding the Msx-1 or Msx-2 homeodomain closely parallel those necessary for osteocalcin gene promoter activity in vivo. This functional relationship was demonstrated by transient expression in ROS 17/2.8 osteosarcoma cells of a series of osteocalcin promoter (nt -1097 to +24)-reporter gene constructs containing mutations within and flanking the homeodomain binding site of the OC box. Northern blot analysis of several bone-related cell types showed that all of the cells expressed msx-1, whereas msx-2 expression was restricted to cells transcribing osteocalcin. Taken together, our results suggest a role for Msx-1 and -2 or related homeodomain proteins in transcription of the osteocalcin gene.

  5. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias.

    Science.gov (United States)

    Kjær, Jonas; Belsham, Graham J

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long), which induces a nonproteolytic, cotranslational "cleavage" at its own C terminus. A conserved feature among variants of 2A is the C-terminal motif N 16 P 17 G 18 /P 19 , where P 19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E 14 , S 15 , and N 16 within the 2A sequence of infectious FMDVs, but no variants at residues P 17 , G 18 , or P 19 have been identified. In this study, using highly degenerate primers, we analyzed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after two, three, or four passages. However, surprisingly, a clear codon preference for the wt nucleotide sequence encoding the NPGP motif within these viruses was observed. Indeed, the codons selected to code for P 17 and P 19 within this motif were distinct; thus the synonymous codons are not equivalent. © 2018 Kjær and Belsham; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  6. A novel disulfide-rich protein motif from avian eggshell membranes.

    Directory of Open Access Journals (Sweden)

    Vamsi K Kodali

    2011-03-01

    Full Text Available Under the shell of a chicken egg are two opposed proteinaceous disulfide-rich membranes. They are fabricated in the avian oviduct using fibers formed from proteins that are extensively coupled by irreversible lysine-derived crosslinks. The intractability of these eggshell membranes (ESM has slowed their characterization and their protein composition remains uncertain. In this work, reductive alkylation of ESM followed by proteolytic digestion led to the identification of a cysteine rich ESM protein (abbreviated CREMP that was similar to spore coat protein SP75 from cellular slime molds. Analysis of the cysteine repeats in partial sequences of CREMP reveals runs of remarkably repetitive patterns. Module a contains a C-X(4-C-X(5-C-X(8-C-X(6 pattern (where X represents intervening non-cysteine residues. These inter-cysteine amino acid residues are also strikingly conserved. The evolutionarily-related module b has the same cysteine spacing as a, but has 11 amino acid residues at its C-terminus. Different stretches of CREMP sequences in chicken genomic DNA fragments show diverse repeat patterns: e.g. all a modules; an alternation of a-b modules; or an a-b-b arrangement. Comparable CREMP proteins are found in contigs of the zebra finch (Taeniopygia guttata and in the oviparous green anole lizard (Anolis carolinensis. In all these cases the long runs of highly conserved modular repeats have evidently led to difficulties in the assembly of full length DNA sequences. Hence the number, and the amino acid lengths, of CREMP proteins are currently unknown. A 118 amino acid fragment (representing an a-b-a-b pattern from a chicken oviduct EST library expressed in Escherichia coli is a well folded, highly anisotropic, protein with a large chemical shift dispersion in 2D solution NMR spectra. Structure is completely lost on reduction of the 8 disulfide bonds of this protein fragment. Finally, solid state NMR spectra suggest a surprising degree of order in intact

  7. Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

    KAUST Repository

    Sayadi, Ahmed; Briganti, Leonardo; Tramontano, Anna; Via, Allegra

    2011-01-01

    The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length

  8. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Science.gov (United States)

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  9. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    Directory of Open Access Journals (Sweden)

    Zing Tsung-Yeh Tsai

    2015-08-01

    Full Text Available Transcription factor (TF binding is determined by the presence of specific sequence motifs (SM and chromatin accessibility, where the latter is influenced by both chromatin state (CS and DNA structure (DS properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  10. Nucleotide sequence of the coat protein gene of the Skierniewice isolate of plum pox virus (PPV)

    International Nuclear Information System (INIS)

    Wypijewski, K.; Musial, W.; Augustyniak, J.; Malinowski, T.

    1994-01-01

    The coat protein (CP) gene of the Skierniewice isolate of plum pox virus (PPV-S) has been amplified using the reverse transcription - polymerase chain reaction (RT-PCR), cloned and sequenced. The nucleotide sequence of the gene and the deduced amino-acid sequences of PPV-S CP were compared with those of other PPV strains. The nucleotide sequence showed very high homology to most of the published sequences. The motif: Asp-Ala-Gly (DAG), important for the aphid transmissibility, was present in the amino-acid sequence. Our isolate did not react in ELISA with monoclonal antibodies MAb06 supposed to be specific for PPV-D. (author). 32 refs, 1 fig., 2 tabs

  11. Conserved retinoblastoma protein-binding motif in human cytomegalovirus UL97 kinase minimally impacts viral replication but affects susceptibility to maribavir

    Directory of Open Access Journals (Sweden)

    Chou Sunwen

    2009-01-01

    Full Text Available Abstract The UL97 kinase has been shown to phosphorylate and inactivate the retinoblastoma protein (Rb and has three consensus Rb-binding motifs that might contribute to this activity. Recombinant viruses containing mutations in the Rb-binding motifs generally replicated well in human foreskin fibroblasts with only a slight delay in replication kinetics. Their susceptibility to the specific UL97 kinase inhibitor, maribavir, was also examined. Mutation of the amino terminal motif, which is involved in the inactivation of Rb, also renders the virus hypersensitive to the drug and suggests that the motif may play a role in its mechanism of action.

  12. Phospholipid composition and a polybasic motif determine D6 PROTEIN KINASE polar association with the plasma membrane and tropic responses.

    Science.gov (United States)

    Barbosa, Inês C R; Shikata, Hiromasa; Zourelidou, Melina; Heilmann, Mareike; Heilmann, Ingo; Schwechheimer, Claus

    2016-12-15

    Polar transport of the phytohormone auxin through PIN-FORMED (PIN) auxin efflux carriers is essential for the spatiotemporal control of plant development. The Arabidopsis thaliana serine/threonine kinase D6 PROTEIN KINASE (D6PK) is polarly localized at the plasma membrane of many cells where it colocalizes with PINs and activates PIN-mediated auxin efflux. Here, we show that the association of D6PK with the basal plasma membrane and PINs is dependent on the phospholipid composition of the plasma membrane as well as on the phosphatidylinositol phosphate 5-kinases PIP5K1 and PIP5K2 in epidermis cells of the primary root. We further show that D6PK directly binds polyacidic phospholipids through a polybasic lysine-rich motif in the middle domain of the kinase. The lysine-rich motif is required for proper PIN3 phosphorylation and for auxin transport-dependent tropic growth. Polybasic motifs are also present at a conserved position in other D6PK-related kinases and required for membrane and phospholipid binding. Thus, phospholipid-dependent recruitment to membranes through polybasic motifs might not only be required for D6PK-mediated auxin transport but also other processes regulated by these, as yet, functionally uncharacterized kinases. © 2016. Published by The Company of Biologists Ltd.

  13. Ménage à trois: the complex relationships between mitogen-activated protein kinases, WRKY transcription factors, and VQ-motif-containing proteins.

    Science.gov (United States)

    Weyhe, Martin; Eschen-Lippold, Lennart; Pecher, Pascal; Scheel, Dierk; Lee, Justin

    2014-01-01

    Out of the 34 members of the VQ-motif-containing protein (VQP) family, 10 are phosphorylated by the mitogen-activated protein kinases (MAPKs), MPK3 and MPK6. Most of these MPK3/6-targeted VQPs (MVQs) interacted with specific sub-groups of WRKY transcription factors in a VQ-motif-dependent manner. In some cases, the MAPK appears to phosphorylate either the MVQ or the WRKY, while in other cases, both proteins have been reported to act as MAPK substrates. We propose a network of dynamic interactions between members from the MAPK, MVQ and WRKY families - either as binary or as tripartite interactions. The compositions of the WRKY-MVQ transcriptional protein complexes may change - for instance, through MPK3/6-mediated modulation of protein stability - and therefore control defense gene transcription.

  14. Identification of a phosphorylation-dependent nuclear localization motif in interferon regulatory factor 2 binding protein 2.

    Directory of Open Access Journals (Sweden)

    Allen C T Teng

    Full Text Available Interferon regulatory factor 2 binding protein 2 (IRF2BP2 is a muscle-enriched transcription factor required to activate vascular endothelial growth factor-A (VEGFA expression in muscle. IRF2BP2 is found in the nucleus of cardiac and skeletal muscle cells. During the process of skeletal muscle differentiation, some IRF2BP2 becomes relocated to the cytoplasm, although the functional significance of this relocation and the mechanisms that control nucleocytoplasmic localization of IRF2BP2 are not yet known.Here, by fusing IRF2BP2 to green fluorescent protein and testing a series of deletion and site-directed mutagenesis constructs, we mapped the nuclear localization signal (NLS to an evolutionarily conserved sequence (354ARKRKPSP(361 in IRF2BP2. This sequence corresponds to a classical nuclear localization motif bearing positively charged arginine and lysine residues. Substitution of arginine and lysine with negatively charged aspartic acid residues blocked nuclear localization. However, these residues were not sufficient because nuclear targeting of IRF2BP2 also required phosphorylation of serine 360 (S360. Many large-scale phosphopeptide proteomic studies had reported previously that serine 360 of IRF2BP2 is phosphorylated in numerous human cell types. Alanine substitution at this site abolished IRF2BP2 nuclear localization in C(2C(12 myoblasts and CV1 cells. In contrast, substituting serine 360 with aspartic acid forced nuclear retention and prevented cytoplasmic redistribution in differentiated C(2C(12 muscle cells. As for the effects of these mutations on VEGFA promoter activity, the S360A mutation interfered with VEGFA activation, as expected. Surprisingly, the S360D mutation also interfered with VEGFA activation, suggesting that this mutation, while enforcing nuclear entry, may disrupt an essential activation function of IRF2BP2.Nuclear localization of IRF2BP2 depends on phosphorylation near a conserved NLS. Changes in phosphorylation status

  15. Characterization of the GXXXG motif in the first transmembrane segment of Japanese encephalitis virus precursor membrane (prM protein

    Directory of Open Access Journals (Sweden)

    Wu Suh-Chin

    2010-05-01

    Full Text Available Abstract The interaction between prM and E proteins in flavivirus-infected cells is a major driving force for the assembly of flavivirus particles. We used site-directed mutagenesis to study the potential role of the transmembrane domains of the prM proteins of Japanese encephalitis virus (JEV in prM-E heterodimerization as well as subviral particle formation. Alanine insertion scanning mutagenesis within the GXXXG motif in the first transmembrane segment of JEV prM protein affected the prM-E heterodimerization; its specificity was confirmed by replacing the two glycines of the GXXXG motif with alanine, leucine and valine. The GXXXG motif was found to be conserved in the JEV serocomplex viruses but not other flavivirus groups. These mutants with alanine inserted in the two prM transmembrane segments all impaired subviral particle formation in cell cultures. The prM transmembrane domains of JEV may play importation roles in prM-E heterodimerization and viral particle assembly.

  16. Localization of Daucus carota NMCP1 to the nuclear periphery: the role of the N-terminal region and an NLS-linked sequence motif, RYNLRR, in the tail domain

    Directory of Open Access Journals (Sweden)

    Yuta eKimura

    2014-02-01

    Full Text Available Recent ultrastructural studies revealed that a structure similar to the vertebrate nuclear lamina exists in the nuclei of higher plants. However, plant genomes lack genes for lamins and intermediate-type filament proteins, and this suggests that plant-specific nuclear coiled-coil proteins make up the lamina-like structure in plants. NMCP1 is a protein, first identified in Daucus carota cells, that localizes exclusively to the nuclear periphery in interphase cells. It has a tripartite structure comprised of head, rod, and tail domains, and includes putative nuclear localization signal (NLS motifs. We identified the functional NLS of DcNMCP1 (carrot NMCP1 and determined the protein regions required for localizing to the nuclear periphery using EGFP-fused constructs transiently expressed in Apium graveolens epidermal cells. Transcription was driven under a CaMV35S promoter, and the genes were introduced into the epidermal cells by a DNA-coated microprojectile delivery system. Of the NLS motifs, KRRRK and RRHK in the tail domain were highly functional for nuclear localization. Addition of the N-terminal 141 amino acids from DcNMCP1 shifted the localization of a region including these NLSs from the entire nucleus to the nuclear periphery. Using this same construct, the replacement of amino acids in RRHK or its preceding sequence, YNL, with alanine residues abolished localization to the nuclear periphery, while replacement of KRRRK did not affect localization. The sequence R/Q/HYNLRR/H, including YNL and the first part of the sequence of RRHK, is evolutionarily conserved in a subclass of NMCP1 sequences from many plant species. These results show that NMCP1 localizes to the nuclear periphery by a combined action of a sequence composed of R/Q/HYNLRR/H, NLS, and the N-terminal region including the head and a portion of the rod domain, suggesting that more than one binding site is implicated in localization of NMCP1.

  17. Transduplication resulted in the incorporation of two protein-coding sequences into the Turmoil-1 transposable element of C. elegans

    Directory of Open Access Journals (Sweden)

    Pupko Tal

    2008-10-01

    Full Text Available Abstract Transposable elements may acquire unrelated gene fragments into their sequences in a process called transduplication. Transduplication of protein-coding genes is common in plants, but is unknown of in animals. Here, we report that the Turmoil-1 transposable element in C. elegans has incorporated two protein-coding sequences into its inverted terminal repeat (ITR sequences. The ITRs of Turmoil-1 contain a conserved RNA recognition motif (RRM that originated from the rsp-2 gene and a fragment from the protein-coding region of the cpg-3 gene. We further report that an open reading frame specific to C. elegans may have been created as a result of a Turmoil-1 insertion. Mutations at the 5' splice site of this open reading frame may have reactivated the transduplicated RRM motif. Reviewers This article was reviewed by Dan Graur and William Martin. For the full reviews, please go to the Reviewers' Reports section.

  18. Nonlinear deterministic structures and the randomness of protein sequences

    CERN Document Server

    Huang Yan Zhao

    2003-01-01

    To clarify the randomness of protein sequences, we make a detailed analysis of a set of typical protein sequences representing each structural classes by using nonlinear prediction method. No deterministic structures are found in these protein sequences and this implies that they behave as random sequences. We also give an explanation to the controversial results obtained in previous investigations.

  19. Adenovirus fibre shaft sequences fold into the native triple beta-spiral fold when N-terminally fused to the bacteriophage T4 fibritin foldon trimerisation motif.

    Science.gov (United States)

    Papanikolopoulou, Katerina; Teixeira, Susana; Belrhali, Hassan; Forsyth, V Trevor; Mitraki, Anna; van Raaij, Mark J

    2004-09-03

    Adenovirus fibres are trimeric proteins that consist of a globular C-terminal domain, a central fibrous shaft and an N-terminal part that attaches to the viral capsid. In the presence of the globular C-terminal domain, which is necessary for correct trimerisation, the shaft segment adopts a triple beta-spiral conformation. We have replaced the head of the fibre by the trimerisation domain of the bacteriophage T4 fibritin, the foldon. Two different fusion constructs were made and crystallised, one with an eight amino acid residue linker and one with a linker of only two residues. X-ray crystallographic studies of both fusion proteins shows that residues 319-391 of the adenovirus type 2 fibre shaft fold into a triple beta-spiral fold indistinguishable from the native structure, although this is now resolved at a higher resolution of 1.9 A. The foldon residues 458-483 also adopt their natural structure. The intervening linkers are not well ordered in the crystal structures. This work shows that the shaft sequences retain their capacity to fold into their native beta-spiral fibrous fold when fused to a foreign C-terminal trimerisation motif. It provides a structural basis to artificially trimerise longer adenovirus shaft segments and segments from other trimeric beta-structured fibre proteins. Such artificial fibrous constructs, amenable to crystallisation and solution studies, can offer tractable model systems for the study of beta-fibrous structure. They can also prove useful for gene therapy and fibre engineering applications.

  20. A ΩXaV motif in the Rift Valley fever virus NSs protein is essential for degrading p62, forming nuclear filaments and virulence.

    Science.gov (United States)

    Cyr, Normand; de la Fuente, Cynthia; Lecoq, Lauriane; Guendel, Irene; Chabot, Philippe R; Kehn-Hall, Kylene; Omichinski, James G

    2015-05-12

    Rift Valley fever virus (RVFV) is a single-stranded RNA virus capable of inducing fatal hemorrhagic fever in humans. A key component of RVFV virulence is its ability to form nuclear filaments through interactions between the viral nonstructural protein NSs and the host general transcription factor TFIIH. Here, we identify an interaction between a ΩXaV motif in NSs and the p62 subunit of TFIIH. This motif in NSs is similar to ΩXaV motifs found in nucleotide excision repair (NER) factors and transcription factors known to interact with p62. Structural and biophysical studies demonstrate that NSs binds to p62 in a similar manner as these other factors. Functional studies in RVFV-infected cells show that the ΩXaV motif is required for both nuclear filament formation and degradation of p62. Consistent with the fact that the RVFV can be distinguished from other Bunyaviridae-family viruses due to its ability to form nuclear filaments in infected cells, the motif is absent in the NSs proteins of other Bunyaviridae-family viruses. Taken together, our studies demonstrate that p62 binding to NSs through the ΩXaV motif is essential for degrading p62, forming nuclear filaments and enhancing RVFV virulence. In addition, these results show how the RVFV incorporates a simple motif into the NSs protein that enables it to functionally mimic host cell proteins that bind the p62 subunit of TFIIH.

  1. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  2. SLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Emily Olorin

    2015-08-01

    Full Text Available Short linear motifs (SLiMs are small protein sequence patterns that mediate a large number of critical protein-protein interactions, involved in processes such as complex formation, signal transduction, localisation and stabilisation. SLiMs show rapid evolutionary dynamics and are frequently the targets of molecular mimicry by pathogens. Identifying enriched sequence patterns due to convergent evolution in non-homologous proteins has proven to be a successful strategy for computational SLiM prediction. Tools of the SLiMSuite package use this strategy, using a statistical model to identify SLiM enrichment based on the evolutionary relationships, amino acid composition and predicted disorder of the input proteins. The quality of input data is critical for successful SLiM prediction. Cytoscape provides a user-friendly, interactive environment to explore interaction networks and select proteins based on common features, such as shared interaction partners. SLiMScape embeds tools of the SLiMSuite package for de novo SLiM discovery (SLiMFinder and QSLiMFinder and identifying occurrences/enrichment of known SLiMs (SLiMProb within this interactive framework. SLiMScape makes it easier to (1 generate high quality hypothesis-driven datasets for these tools, and (2 visualise predicted SLiM occurrences within the context of the network. To generate new predictions, users can select nodes from a protein network or provide a set of Uniprot identifiers. SLiMProb also requires additional query motif input. Jobs are then run remotely on the SLiMSuite server (http://rest.slimsuite.unsw.edu.au for subsequent retrieval and visualisation. SLiMScape can also be used to retrieve and visualise results from jobs run directly on the server. SLiMScape and SLiMSuite are open source and freely available via GitHub under GNU licenses.

  3. Role of NH2-terminal hydrophobic motif in the subcellular localization of ATP-binding cassette protein subfamily D: Common features in eukaryotic organisms

    International Nuclear Information System (INIS)

    Lee, Asaka; Asahina, Kota; Okamoto, Takumi; Kawaguchi, Kosuke; Kostsin, Dzmitry G.; Kashiwayama, Yoshinori; Takanashi, Kojiro; Yazaki, Kazufumi; Imanaka, Tsuneo; Morita, Masashi

    2014-01-01

    Highlights: • ABCD proteins classifies based on with or without NH 2 -terminal hydrophobic segment. • The ABCD proteins with the segment are targeted peroxisomes. • The ABCD proteins without the segment are targeted to the endoplasmic reticulum. • The role of the segment in organelle targeting is conserved in eukaryotic organisms. - Abstract: In mammals, four ATP-binding cassette (ABC) proteins belonging to subfamily D have been identified. ABCD1–3 possesses the NH 2 -terminal hydrophobic region and are targeted to peroxisomes, while ABCD4 lacking the region is targeted to the endoplasmic reticulum (ER). Based on hydropathy plot analysis, we found that several eukaryotes have ABCD protein homologs lacking the NH 2 -terminal hydrophobic segment (H0 motif). To investigate whether the role of the NH 2 -terminal H0 motif in subcellular localization is conserved across species, we expressed ABCD proteins from several species (metazoan, plant and fungi) in fusion with GFP in CHO cells and examined their subcellular localization. ABCD proteins possessing the NH 2 -terminal H0 motif were localized to peroxisomes, while ABCD proteins lacking this region lost this capacity. In addition, the deletion of the NH 2 -terminal H0 motif of ABCD protein resulted in their localization to the ER. These results suggest that the role of the NH 2 -terminal H0 motif in organelle targeting is widely conserved in living organisms

  4. Methods and statistics for combining motif match scores.

    Science.gov (United States)

    Bailey, T L; Gribskov, M

    1998-01-01

    Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

  5. Sequence-based feature prediction and annotation of proteins

    DEFF Research Database (Denmark)

    Juncker, Agnieszka; Jensen, Lars J.; Pierleoni, Andrea

    2009-01-01

    A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome....

  6. A Tyrosine-Based Trafficking Motif of the Tegument Protein pUL71 Is Crucial for Human Cytomegalovirus Secondary Envelopment.

    Science.gov (United States)

    Dietz, Andrea N; Villinger, Clarissa; Becker, Stefan; Frick, Manfred; von Einem, Jens

    2018-01-01

    The human cytomegalovirus (HCMV) tegument protein pUL71 is required for efficient secondary envelopment and accumulates at the Golgi compartment-derived viral assembly complex (vAC) during infection. Analysis of various C-terminally truncated pUL71 proteins fused to enhanced green fluorescent protein (eGFP) identified amino acids 23 to 34 as important determinants for its Golgi complex localization. Sequence analysis and mutational verification revealed the presence of an N-terminal tyrosine-based trafficking motif (YXXΦ) in pUL71. This led us to hypothesize a requirement of the YXXΦ motif for the function of pUL71 in infection. Mutation of both the tyrosine residue and the entire YXXΦ motif resulted in an altered distribution of mutant pUL71 at the plasma membrane and in the cytoplasm during infection. Both YXXΦ mutant viruses exhibited similarly decreased focal growth and reduced virus yields in supernatants. Ultrastructurally, mutant-virus-infected cells exhibited impaired secondary envelopment manifested by accumulations of capsids undergoing an envelopment process. Additionally, clusters of capsid accumulations surrounding the vAC were observed, similar to the ultrastructural phenotype of a UL71-deficient mutant. The importance of endocytosis and thus the YXXΦ motif for targeting pUL71 to the Golgi complex was further demonstrated when clathrin-mediated endocytosis was inhibited either by coexpression of the C-terminal part of cellular AP180 (AP180-C) or by treatment with methyl-β-cyclodextrin. Both conditions resulted in a plasma membrane accumulation of pUL71. Altogether, these data reveal the presence of a functional N-terminal endocytosis motif that is an important determinant for intracellular localization of pUL71 and that is furthermore required for the function of pUL71 during secondary envelopment of HCMV capsids at the vAC. IMPORTANCE Human cytomegalovirus (HCMV) is the leading cause of birth defects among congenital virus infections and can

  7. Analysis of alkaptonuria (AKU) mutations and polymorphisms reveals that the CCC sequence motif is a mutational hot spot in the homogentisate 1,2 dioxygenase gene (HGO).

    Science.gov (United States)

    Beltrán-Valero de Bernabé, D; Jimenez, F J; Aquaron, R; Rodríguez de Córdoba, S

    1999-01-01

    We recently showed that alkaptonuria (AKU) is caused by loss-of-function mutations in the homogentisate 1,2 dioxygenase gene (HGO). Herein we describe haplotype and mutational analyses of HGO in seven new AKU pedigrees. These analyses identified two novel single-nucleotide polymorphisms (INV4+31A-->G and INV11+18A-->G) and six novel AKU mutations (INV1-1G-->A, W60G, Y62C, A122D, P230T, and D291E), which further illustrates the remarkable allelic heterogeneity found in AKU. Reexamination of all 29 mutations and polymorphisms thus far described in HGO shows that these nucleotide changes are not randomly distributed; the CCC sequence motif and its inverted complement, GGG, are preferentially mutated. These analyses also demonstrated that the nucleotide substitutions in HGO do not involve CpG dinucleotides, which illustrates important differences between HGO and other genes for the occurrence of mutation at specific short-sequence motifs. Because the CCC sequence motifs comprise a significant proportion (34.5%) of all mutated bases that have been observed in HGO, we conclude that the CCC triplet is a mutational hot spot in HGO. PMID:10205262

  8. Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

    Science.gov (United States)

    Ayub, Gohar; Waheed, Yasir

    2016-06-01

    The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.

  9. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  10. Use of Host-like Peptide Motifs in Viral Proteins Is a Prevalent Strategy in Host-Virus Interactions

    Directory of Open Access Journals (Sweden)

    Tzachi Hagai

    2014-06-01

    Full Text Available Viruses interact extensively with host proteins, but the mechanisms controlling these interactions are not well understood. We present a comprehensive analysis of eukaryotic linear motifs (ELMs in 2,208 viral genomes and reveal that viruses exploit molecular mimicry of host-like ELMs to possibly assist in host-virus interactions. Using a statistical genomics approach, we identify a large number of potentially functional ELMs and observe that the occurrence of ELMs is often evolutionarily conserved but not uniform across virus families. Some viral proteins contain multiple types of ELMs, in striking similarity to complex regulatory modules in host proteins, suggesting that ELMs may act combinatorially to assist viral replication. Furthermore, a simple evolutionary model suggests that the inherent structural simplicity of ELMs often enables them to tolerate mutations and evolve quickly. Our findings suggest that ELMs may allow fast rewiring of host-virus interactions, which likely assists rapid viral evolution and adaptation to diverse environments.

  11. Autoinhibition and signaling by the switch II motif in the G-protein chaperone of a radical B12 enzyme.

    Science.gov (United States)

    Lofgren, Michael; Koutmos, Markos; Banerjee, Ruma

    2013-10-25

    MeaB is an accessory GTPase protein involved in the assembly, protection, and reactivation of 5'-deoxyadenosyl cobalamin-dependent methylmalonyl-CoA mutase (MCM). Mutations in the human ortholog of MeaB result in methylmalonic aciduria, an inborn error of metabolism. G-proteins typically utilize conserved switch I and II motifs for signaling to effector proteins via conformational changes elicited by nucleotide binding and hydrolysis. Our recent discovery that MeaB utilizes an unusual switch III region for bidirectional signaling with MCM raised questions about the roles of the switch I and II motifs in MeaB. In this study, we addressed the functions of conserved switch II residues by performing alanine-scanning mutagenesis. Our results demonstrate that the GTPase activity of MeaB is autoinhibited by switch II and that this loop is important for coupling nucleotide-sensitive conformational changes in switch III to elicit the multiple chaperone functions of MeaB. Furthermore, we report the structure of MeaB·GDP crystallized in the presence of AlFx(-) to form the putative transition state analog, GDP·AlF4(-). The resulting crystal structure and its comparison with related G-proteins support the conclusion that the catalytic site of MeaB is incomplete in the absence of the GTPase-activating protein MCM and therefore unable to stabilize the transition state analog. Favoring an inactive conformation in the absence of the client MCM protein might represent a strategy for suppressing the intrinsic GTPase activity of MeaB in which the switch II loop plays an important role.

  12. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

    Science.gov (United States)

    Neuwald, Andrew F

    2009-08-01

    The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.

  13. Evolutionarily conserved bias of amino-acid usage refines the definition of PDZ-binding motif

    Directory of Open Access Journals (Sweden)

    Launey Thomas

    2011-06-01

    Full Text Available Abstract Background The interactions between PDZ (PSD-95, Dlg, ZO-1 domains and PDZ-binding motifs play central roles in signal transductions within cells. Proteins with PDZ domains bind to PDZ-binding motifs almost exclusively when the motifs are located at the carboxyl (C- terminal ends of their binding partners. However, it remains little explored whether PDZ-binding motifs show any preferential location at the C-terminal ends of proteins, at genome-level. Results Here, we examined the distribution of the type-I (x-x-S/T-x-I/L/V or type-II (x-x-V-x-I/V PDZ-binding motifs in proteins encoded in the genomes of five different species (human, mouse, zebrafish, fruit fly and nematode. We first established that these PDZ-binding motifs are indeed preferentially present at their C-terminal ends. Moreover, we found specific amino acid (AA bias for the 'x' positions in the motifs at the C-terminal ends. In general, hydrophilic AAs were favored. Our genomics-based findings confirm and largely extend the results of previous interaction-based studies, allowing us to propose refined consensus sequences for all of the examined PDZ-binding motifs. An ontological analysis revealed that the refined motifs are functionally relevant since a large fraction of the proteins bearing the motif appear to be involved in signal transduction. Furthermore, co-precipitation experiments confirmed two new protein interactions predicted by our genomics-based approach. Finally, we show that influenza virus pathogenicity can be correlated with PDZ-binding motif, with high-virulence viral proteins bearing a refined PDZ-binding motif. Conclusions Our refined definition of PDZ-binding motifs should provide important clues for identifying functional PDZ-binding motifs and proteins involved in signal transduction.

  14. The PDZ-binding motif of Yes-associated protein is required for its co-activation of TEAD-mediated CTGF transcription and oncogenic cell transforming activity

    International Nuclear Information System (INIS)

    Shimomura, Tadanori; Miyamura, Norio; Hata, Shoji; Miura, Ryota; Hirayama, Jun; Nishina, Hiroshi

    2014-01-01

    Highlights: •Loss of the PDZ-binding motif inhibits constitutively active YAP (5SA)-induced oncogenic cell transformation. •The PDZ-binding motif of YAP promotes its nuclear localization in cultured cells and mouse liver. •Loss of the PDZ-binding motif inhibits YAP (5SA)-induced CTGF transcription in cultured cells and mouse liver. -- Abstract: YAP is a transcriptional co-activator that acts downstream of the Hippo signaling pathway and regulates multiple cellular processes, including proliferation. Hippo pathway-dependent phosphorylation of YAP negatively regulates its function. Conversely, attenuation of Hippo-mediated phosphorylation of YAP increases its ability to stimulate proliferation and eventually induces oncogenic transformation. The C-terminus of YAP contains a highly conserved PDZ-binding motif that regulates YAP’s functions in multiple ways. However, to date, the importance of the PDZ-binding motif to the oncogenic cell transforming activity of YAP has not been determined. In this study, we disrupted the PDZ-binding motif in the YAP (5SA) protein, in which the sites normally targeted by Hippo pathway-dependent phosphorylation are mutated. We found that loss of the PDZ-binding motif significantly inhibited the oncogenic transformation of cultured cells induced by YAP (5SA). In addition, the increased nuclear localization of YAP (5SA) and its enhanced activation of TEAD-dependent transcription of the cell proliferation gene CTGF were strongly reduced when the PDZ-binding motif was deleted. Similarly, in mouse liver, deletion of the PDZ-binding motif suppressed nuclear localization of YAP (5SA) and YAP (5SA)-induced CTGF expression. Taken together, our results indicate that the PDZ-binding motif of YAP is critical for YAP-mediated oncogenesis, and that this effect is mediated by YAP’s co-activation of TEAD-mediated CTGF transcription

  15. The PDZ-binding motif of Yes-associated protein is required for its co-activation of TEAD-mediated CTGF transcription and oncogenic cell transforming activity

    Energy Technology Data Exchange (ETDEWEB)

    Shimomura, Tadanori; Miyamura, Norio; Hata, Shoji; Miura, Ryota; Hirayama, Jun, E-mail: hirayama.dbio@mri.tmd.ac.jp; Nishina, Hiroshi, E-mail: nishina.dbio@mri.tmd.ac.jp

    2014-01-17

    Highlights: •Loss of the PDZ-binding motif inhibits constitutively active YAP (5SA)-induced oncogenic cell transformation. •The PDZ-binding motif of YAP promotes its nuclear localization in cultured cells and mouse liver. •Loss of the PDZ-binding motif inhibits YAP (5SA)-induced CTGF transcription in cultured cells and mouse liver. -- Abstract: YAP is a transcriptional co-activator that acts downstream of the Hippo signaling pathway and regulates multiple cellular processes, including proliferation. Hippo pathway-dependent phosphorylation of YAP negatively regulates its function. Conversely, attenuation of Hippo-mediated phosphorylation of YAP increases its ability to stimulate proliferation and eventually induces oncogenic transformation. The C-terminus of YAP contains a highly conserved PDZ-binding motif that regulates YAP’s functions in multiple ways. However, to date, the importance of the PDZ-binding motif to the oncogenic cell transforming activity of YAP has not been determined. In this study, we disrupted the PDZ-binding motif in the YAP (5SA) protein, in which the sites normally targeted by Hippo pathway-dependent phosphorylation are mutated. We found that loss of the PDZ-binding motif significantly inhibited the oncogenic transformation of cultured cells induced by YAP (5SA). In addition, the increased nuclear localization of YAP (5SA) and its enhanced activation of TEAD-dependent transcription of the cell proliferation gene CTGF were strongly reduced when the PDZ-binding motif was deleted. Similarly, in mouse liver, deletion of the PDZ-binding motif suppressed nuclear localization of YAP (5SA) and YAP (5SA)-induced CTGF expression. Taken together, our results indicate that the PDZ-binding motif of YAP is critical for YAP-mediated oncogenesis, and that this effect is mediated by YAP’s co-activation of TEAD-mediated CTGF transcription.

  16. The NS1 polypeptide of the murine parvovirus minute virus of mice binds to DNA sequences containing the motif [ACCA]2-3.

    Science.gov (United States)

    Cotmore, S F; Christensen, J; Nüesch, J P; Tattersall, P

    1995-03-01

    A DNA fragment containing the minute virus of mice 3' replication origin was specifically coprecipitated in immune complexes containing the virally coded NS1, but not the NS2, polypeptide. Antibodies directed against the amino- or carboxy-terminal regions of NS1 precipitated the NS1-origin complexes, but antibodies directed against NS1 amino acids 284 to 459 blocked complex formation. Using affinity-purified histidine-tagged NS1 preparations, we have shown that the specific protein-DNA interaction is of moderate affinity, being stable in 0.1 M salt but rapidly lost at higher salt concentrations. In contrast, generalized (or nonspecific) DNA binding by NS1 could be demonstrated only in low salt. Addition of ATP or gamma S-ATP enhanced specific DNA binding by wild-type NS1 severalfold, but binding was lost under conditions which favored ATP hydrolysis. NS1 molecules with mutations in a critical lysine residue (amino acid 405) in the consensus ATP-binding site bound to the origin, but this binding could not be enhanced by ATP addition. DNase I protection assays carried out with wild-type NS1 in the presence of gamma S-ATP gave footprints which extended over 43 nucleotides on both DNA strands, from the middle of the origin bubble sequence to a position some 14 bp beyond the nick site. The DNA-binding site for NS1 was mapped to a 22-bp fragment from the middle of the 3' replication origin which contains the sequence ACCAACCA. This conforms to a reiterated motif (ACCA)2-3, which occurs, in more or less degenerate form, at many sites throughout the minute virus of mice genome (J. W. Bodner, Virus Genes 2:167-182, 1989). Insertion of a single copy of the sequence (ACCA)3 was shown to be sufficient to confer NS1 binding on an otherwise unrecognized plasmid fragment. The functions of NS1 in the viral life cycle are reevaluated in the light of this result.

  17. RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins.

    Directory of Open Access Journals (Sweden)

    Hilal Kazan

    2010-07-01

    Full Text Available Metazoan genomes encode hundreds of RNA-binding proteins (RBPs. These proteins regulate post-transcriptional gene expression and have critical roles in numerous cellular processes including mRNA splicing, export, stability and translation. Despite their ubiquity and importance, the binding preferences for most RBPs are not well characterized. In vitro and in vivo studies, using affinity selection-based approaches, have successfully identified RNA sequence associated with specific RBPs; however, it is difficult to infer RBP sequence and structural preferences without specifically designed motif finding methods. In this study, we introduce a new motif-finding method, RNAcontext, designed to elucidate RBP-specific sequence and structural preferences with greater accuracy than existing approaches. We evaluated RNAcontext on recently published in vitro and in vivo RNA affinity selected data and demonstrate that RNAcontext identifies known binding preferences for several control proteins including HuR, PTB, and Vts1p and predicts new RNA structure preferences for SF2/ASF, RBM4, FUSIP1 and SLM2. The predicted preferences for SF2/ASF are consistent with its recently reported in vivo binding sites. RNAcontext is an accurate and efficient motif finding method ideally suited for using large-scale RNA-binding affinity datasets to determine the relative binding preferences of RBPs for a wide range of RNA sequences and structures.

  18. The SWISS-PROT protein sequence data bank: current status.

    OpenAIRE

    Bairoch, A; Boeckmann, B

    1994-01-01

    SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library. The SWISS-PROT protein sequence data bank consist of sequence entries. Sequence entries are composed of different lines types, each with their own format. For standardization purposes the format of SWISS-PROT follows as closely as possible that of the EMBL Nucleotide Sequence Databa...

  19. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin

    2015-01-01

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  20. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  1. Biophysical properties of regions flanking the bHLH-Zip motif in the p22 Max protein

    International Nuclear Information System (INIS)

    Pursglove, Sharon E.; Fladvad, Malin; Bellanda, Massimo; Moshref, Ahmad; Henriksson, Marie; Carey, Jannette; Sunnerhagen, Maria

    2004-01-01

    The Max protein is the central dimerization partner in the Myc-Max-Mad network of transcriptional regulators, and a founding structural member of the family of basic-helix-loop-helix (bHLH)-leucine zipper (Zip) proteins. Biologically important regions flanking its bHLH-Zip motif have been disordered or absent in crystal structures. The present study shows that these regions are resistant to proteolysis in both the presence and absence of DNA, and that Max dimers containing both flanking regions have significantly higher helix content as measured by circular dichroism than that predicted from the crystal structures. Nuclear magnetic resonance measurements in the absence of DNA also support the inferred structural order. Deletion of both flanking regions is required to achieve maximal DNA affinity as measured by EMSA. Thus, the previously observed functionalities of these Max regions in DNA binding, phosphorylation, and apoptosis are suggested to be linked to structural properties

  2. The Crc and Hfq proteins of Pseudomonas putida cooperate in catabolite repression and formation of ribonucleic acid complexes with specific target motifs.

    Science.gov (United States)

    Moreno, Renata; Hernández-Arranz, Sofía; La Rosa, Ruggero; Yuste, Luis; Madhushani, Anjana; Shingler, Victoria; Rojo, Fernando

    2015-01-01

    The Crc protein is a global regulator that has a key role in catabolite repression and optimization of metabolism in Pseudomonads. Crc inhibits gene expression post-transcriptionally, preventing translation of mRNAs bearing an AAnAAnAA motif [the catabolite activity (CA) motif] close to the translation start site. Although Crc was initially believed to bind RNA by itself, this idea was recently challenged by results suggesting that a protein co-purifying with Crc, presumably the Hfq protein, could account for the detected RNA-binding activity. Hfq is an abundant protein that has a central role in post-transcriptional gene regulation. Herein, we show that the Pseudomonas putida Hfq protein can recognize the CA motifs of RNAs through its distal face and that Crc facilitates formation of a more stable complex at these targets. Crc was unable to bind RNA in the absence of Hfq. However, pull-down assays showed that Crc and Hfq can form a co-complex with RNA containing a CA motif in vitro. Inactivation of the hfq or the crc gene impaired catabolite repression to a similar extent. We propose that Crc and Hfq cooperate in catabolite repression, probably through forming a stable co-complex with RNAs containing CA motifs to result in inhibition of translation initiation. © 2014 Society for Applied Microbiology and John Wiley & Sons Ltd.

  3. Next-Generation Sequencing for Binary Protein-Protein Interactions

    Directory of Open Access Journals (Sweden)

    Bernhard eSuter

    2015-12-01

    Full Text Available The yeast two-hybrid (Y2H system exploits host cell genetics in order to display binary protein-protein interactions (PPIs via defined and selectable phenotypes. Numerous improvements have been made to this method, adapting the screening principle for diverse applications, including drug discovery and the scale-up for proteome wide interaction screens in human and other organisms. Here we discuss a systematic workflow and analysis scheme for screening data generated by Y2H and related assays that includes high-throughput selection procedures, readout of comprehensive results via next-generation sequencing (NGS, and the interpretation of interaction data via quantitative statistics. The novel assays and tools will serve the broader scientific community to harness the power of NGS technology to address PPI networks in health and disease. We discuss examples of how this next-generation platform can be applied to address specific questions in diverse fields of biology and medicine.

  4. CMD: A Database to Store the Bonding States of Cysteine Motifs with Secondary Structures

    Directory of Open Access Journals (Sweden)

    Hamed Bostan

    2012-01-01

    Full Text Available Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.

  5. TCR comodulation of nonengaged TCR takes place by a protein kinase C and CD3 gamma di-leucine-based motif-dependent mechanism

    DEFF Research Database (Denmark)

    Bonefeld, Charlotte Menné; Rasmussen, B. A.; Lauritsen, J P

    2003-01-01

    of comodulation. Like internalization of engaged TCR, comodulation was dependent on protein tyrosine kinase activity. Finally, we found that in contrast to internalization of engaged TCR, comodulation was highly dependent on protein kinase C activity and the CD3 gamma di-leucine-based motif. Based...

  6. Motif III in superfamily 2 "helicases" helps convert the binding energy of ATP into a high-affinity RNA binding site in the yeast DEAD-box protein Ded1.

    Science.gov (United States)

    Banroques, Josette; Doère, Monique; Dreyfus, Marc; Linder, Patrick; Tanner, N Kyle

    2010-03-05

    Motif III in the putative helicases of superfamily 2 is highly conserved in both its sequence and its structural context. It typically consists of the sequence alcohol-alanine-alcohol (S/T-A-S/T). Historically, it was thought to link ATPase activity with a "helicase" strand displacement activity that disrupts RNA or DNA duplexes. DEAD-box proteins constitute the largest family of superfamily 2; they are RNA-dependent ATPases and ATP-dependent RNA binding proteins that, in some cases, are able to disrupt short RNA duplexes. We made mutations of motif III (S-A-T) in the yeast DEAD-box protein Ded1 and analyzed in vivo phenotypes and in vitro properties. Moreover, we made a tertiary model of Ded1 based on the solved structure of Vasa. We used Ded1 because it has relatively high ATPase and RNA binding activities; it is able to displace moderately stable duplexes at a large excess of substrate. We find that the alanine and the threonine in the second and third positions of motif III are more important than the serine, but that mutations of all three residues have strong phenotypes. We purified the wild-type and various mutants expressed in Escherichia coli. We found that motif III mutations affect the RNA-dependent hydrolysis of ATP (k(cat)), but not the affinity for ATP (K(m)). Moreover, mutations alter and reduce the affinity for single-stranded RNA and subsequently reduce the ability to disrupt duplexes. We obtained intragenic suppressors of the S-A-C mutant that compensate for the mutation by enhancing the affinity for ATP and RNA. We conclude that motif III and the binding energy of gamma-PO(4) of ATP are used to coordinate motifs I, II, and VI and the two RecA-like domains to create a high-affinity single-stranded RNA binding site. It also may help activate the beta,gamma-phosphoanhydride bond of ATP. (c) 2009 Elsevier Ltd. All rights reserved.

  7. Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation.

    Science.gov (United States)

    Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K

    2016-09-01

    S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation.

  8. Sequence and conformational preferences at termini of α-helices in membrane proteins: role of the helix environment.

    Science.gov (United States)

    Shelar, Ashish; Bansal, Manju

    2014-12-01

    α-Helices are amongst the most common secondary structural elements seen in membrane proteins and are packed in the form of helix bundles. These α-helices encounter varying external environments (hydrophobic, hydrophilic) that may influence the sequence preferences at their N and C-termini. The role of the external environment in stabilization of the helix termini in membrane proteins is still unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical membrane proteins and establish that their sequence and conformational preferences differ from those in globular proteins. We specifically examine these preferences at the N and C-termini in helices initiating/terminating inside the membrane core as well as in linkers connecting these transmembrane helices. We find that the sequence preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N' and C') positions are influenced by a combination of features including the membrane environment and the innate helix initiation and termination property of residues forming structural motifs. We also find that a large number of helix termini which do not form any particular capping motif are stabilized by formation of hydrogen bonds and hydrophobic interactions contributed from the neighboring helices in the membrane protein. We further validate the sequence preferences obtained from our analysis with data from an ultradeep sequencing study that identifies evolutionarily conserved amino acids in the rat neurotensin receptor. The results from our analysis provide insights for the secondary structure prediction, modeling and design of membrane proteins. © 2014 Wiley Periodicals, Inc.

  9. Predicting allergenicity of proteins using Physical–Chemical Property (PCP) motifs

    Science.gov (United States)

    Motivation: Quantitative guidelines to distinguish allergenic proteins from related, but non-allergenic ones are urgently needed for regulatory agencies, biotech companies and physicians. Cataloguing the SDAP database has indicated that allergenic proteins populate a relatively small number of prote...

  10. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias

    DEFF Research Database (Denmark)

    Kjær, Jonas; Belsham, Graham J.

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long) which induces a non-proteolytic, co-translational, "cleavage" at its own C......-terminus. A conserved feature among variants of 2A is the C-terminal motif N16P17G18/P19 where P19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E14, S15 and N16 within the 2A sequence of infectious FMDVs but no variants at residues P17, G18...... or P19 have been identified. In this study, using highly degenerate primers, we analysed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after 2, 3 or 4 passages. However...

  11. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  12. Simple sequence proteins in prokaryotic proteomes

    Directory of Open Access Journals (Sweden)

    Ramachandran Srinivasan

    2006-06-01

    Full Text Available Abstract Background The structural and functional features associated with Simple Sequence Proteins (SSPs are non-globularity, disease states, signaling and post-translational modification. SSPs are also an important source of genetic and possibly phenotypic variation. Analysis of 249 prokaryotic proteomes offers a new opportunity to examine the genomic properties of SSPs. Results SSPs are a minority but they grow with proteome size. This relationship is exhibited across species varying in genomic GC, mutational bias, life style, and pathogenicity. Their proportion in each proteome is strongly influenced by genomic base compositional bias. In most species simple duplications is favoured, but in a few cases such as Mycobacteria, large families of duplications occur. Amino acid preference in SSPs exhibits a trend towards low cost of biosynthesis. In SSPs and in non-SSPs, Alanine, Glycine, Leucine, and Valine are abundant in species widely varying in genomic GC whereas Isoleucine and Lysine are rich only in organisms with low genomic GC. Arginine is abundant in SSPs of two species and in the non-SSPs of Xanthomonas oryzae. Asparagine is abundant only in SSPs of low GC species. Aspartic acid is abundant only in the non-SSPs of Halobacterium sp NRC1. The abundance of Serine in SSPs of 62 species extends over a broader range compared to that of non-SSPs. Threonine(T is abundant only in SSPs of a couple of species. SSPs exhibit preferential association with Cell surface, Cell membrane and Transport functions and a negative association with Metabolism. Mesophiles and Thermophiles display similar ranges in the content of SSPs. Conclusion Although SSPs are a minority, the genomic forces of base compositional bias and duplications influence their growth and pattern in each species. The preferences and abundance of amino acids are governed by low biosynthetic cost, evolutionary age and base composition of codons. Abundance of charged amino acids Arginine

  13. Fox-2 Splicing Factor Binds to a Conserved Intron Motif to PromoteInclusion of Protein 4.1R Alternative Exon 16

    Energy Technology Data Exchange (ETDEWEB)

    Ponthier, Julie L.; Schluepen, Christina; Chen, Weiguo; Lersch,Robert A.; Gee, Sherry L.; Hou, Victor C.; Lo, Annie J.; Short, Sarah A.; Chasis, Joel A.; Winkelmann, John C.; Conboy, John G.

    2006-03-01

    Activation of protein 4.1R exon 16 (E16) inclusion during erythropoiesis represents a physiologically important splicing switch that increases 4.1R affinity for spectrin and actin. Previous studies showed that negative regulation of E16 splicing is mediated by the binding of hnRNP A/B proteins to silencer elements in the exon and that downregulation of hnRNP A/B proteins in erythroblasts leads to activation of E16 inclusion. This paper demonstrates that positive regulation of E16 splicing can be mediated by Fox-2 or Fox-1, two closely related splicing factors that possess identical RNA recognition motifs. SELEX experiments with human Fox-1 revealed highly selective binding to the hexamer UGCAUG. Both Fox-1 and Fox-2 were able to bind the conserved UGCAUG elements in the proximal intron downstream of E16, and both could activate E16 splicing in HeLa cell co-transfection assays in a UGCAUG-dependent manner. Conversely, knockdown of Fox-2 expression, achieved with two different siRNA sequences resulted in decreased E16 splicing. Moreover, immunoblot experiments demonstrate mouse erythroblasts express Fox-2, but not Fox-1. These findings suggest that Fox-2 is a physiological activator of E16 splicing in differentiating erythroid cells in vivo. Recent experiments show that UGCAUG is present in the proximal intron sequence of many tissue-specific alternative exons, and we propose that the Fox family of splicing enhancers plays an important role in alternative splicing switches during differentiation in metazoan organisms.

  14. An Inhibitory Motif on the 5’UTR of Several Rotavirus Genome Segments Affects Protein Expression and Reverse Genetics Strategies

    Science.gov (United States)

    Papa, Guido; Eichwald, Catherine; Burrone, Oscar R.

    2016-01-01

    Rotavirus genome consists of eleven segments of dsRNA, each encoding one single protein. Viral mRNAs contain an open reading frame (ORF) flanked by relatively short untranslated regions (UTRs), whose role in the viral cycle remains elusive. Here we investigated the role of 5’UTRs in T7 polymerase-driven cDNAs expression in uninfected cells. The 5’UTRs of eight genome segments (gs3, gs5-6, gs7-11) of the simian SA11 strain showed a strong inhibitory effect on the expression of viral proteins. Decreased protein expression was due to both compromised transcription and translation and was independent of the ORF and the 3’UTR sequences. Analysis of several mutants of the 21-nucleotide long 5’UTR of gs 11 defined an inhibitory motif (IM) represented by its primary sequence rather than its secondary structure. IM was mapped to the 5’ terminal 6-nucleotide long pyrimidine-rich tract 5’-GGY(U/A)UY-3’. The 5’ terminal position within the mRNA was shown to be essentially required, as inhibitory activity was lost when IM was moved to an internal position. We identified two mutations (insertion of a G upstream the 5’UTR and the U to A mutation of the fifth nucleotide of IM) that render IM non-functional and increase the transcription and translation rate to levels that could considerably improve the efficiency of virus helper-free reverse genetics strategies. PMID:27846320

  15. Antibody classes & subclasses induced by mucosal immunization of mice with Streptococcus pyogenes M6 protein & oligodeoxynucleotides containing CpG motifs.

    Science.gov (United States)

    Teloni, R; von Hunolstein, C; Mariotti, S; Donati, S; Orefici, G; Nisini, R

    2004-05-01

    Type-specific antibodies against M protein are critical for human protection as they enhance phagocytosis and are protective. An ideal vaccine for the protection against Streptococcus pyogenes would warrant mucosal immunity, but mucosally administered M-protein has been shown to be poorly immunogenic in animals. We used a recombinant M type 6 protein to immunize mice in the presence of synthetic oligodeoxynucleotides containing CpG motifs (immunostimulatory sequences: ISS) or cholera toxin (CT) to explore its possible usage in a mucosal vaccine. Mice were immunized by intranasal (in) or intradermal (id) administration with four doses at weekly intervals of M6-protein (10 microg/mouse) with or without adjuvant (ISS, 10 microg/mouse or CT, 0,5 microg/mouse). M6 specific antibodies were measured by enzyme linked immunosorbent assay using class and subclass specific monoclonal antibodies. The use of ISS induced an impressive anti M-protein serum IgG response but when id administered was not detectable in the absence of adjuvant. When used in, M-protein in the presence of both ISS and CT induced anti M-protein IgA in the bronchoalveolar lavage, as well as specific IgG in the serum. IgG were able to react with serotype M6 strains of S. pyogenes. The level of antibodies obtained by immunizing mice in with M-protein and CT was higher in comparison to M-protein and ISS. The analysis of anti-M protein specific IgG subclasses showed high levels of IgG1, IgG2a and IgG2b, and low levels of IgG3 when ISS were used as adjuvant. Thus, in the presence of ISS, the ratio IgG2a/IgG1 and (IgG2a+IgG3)/IgG1 >1 indicated a type 1-like response obtained both in mucosally or systemically vaccinated mice. Our study offers a reproducible model of anti-M protein vaccination that could be applied to test new antigenic formulations to induce an anti-group A Streptococcus (GAS) vaccination suitable for protection against the different diseases caused by this bacterium.

  16. Non-canonical binding interactions of the RNA recognition motif (RRM) domains of P34 protein modulate binding within the 5S ribonucleoprotein particle (5S RNP).

    Science.gov (United States)

    Kamina, Anyango D; Williams, Noreen

    2017-01-01

    RNA binding proteins are involved in many aspects of RNA metabolism. In Trypanosoma brucei, our laboratory has identified two trypanosome-specific RNA binding proteins P34 and P37 that are involved in the maturation of the 60S subunit during ribosome biogenesis. These proteins are part of the T. brucei 5S ribonucleoprotein particle (5S RNP) and P34 binds to 5S ribosomal RNA (rRNA) and ribosomal protein L5 through its N-terminus and its RNA recognition motif (RRM) domains. We generated truncated P34 proteins to determine these domains' interactions with 5S rRNA and L5. Our analyses demonstrate that RRM1 of P34 mediates the majority of binding with 5S rRNA and the N-terminus together with RRM1 contribute the most to binding with L5. We determined that the consensus ribonucleoprotein (RNP) 1 and 2 sequences, characteristic of canonical RRM domains, are not fully conserved in the RRM domains of P34. However, the aromatic amino acids previously described to mediate base stacking interactions with their RNA target are conserved in both of the RRM domains of P34. Surprisingly, mutation of these aromatic residues did not disrupt but instead enhanced 5S rRNA binding. However, we identified four arginine residues located in RRM1 of P34 that strongly impact L5 binding. These mutational analyses of P34 suggest that the binding site for 5S rRNA and L5 are near each other and specific residues within P34 regulate the formation of the 5S RNP. These studies show the unique way that the domains of P34 mediate binding with the T. brucei 5S RNP.

  17. Measles Virus: Identification in the M Protein Primary Sequence of a Potential Molecular Marker for Subacute Sclerosing Panencephalitis

    Directory of Open Access Journals (Sweden)

    Hasan Kweder

    2015-01-01

    Full Text Available Subacute Sclerosing Panencephalitis (SSPE, a rare lethal disease of children and young adults due to persistence of measles virus (MeV in the brain, is caused by wild type (wt MeV. Why MeV vaccine strains never cause SSPE is completely unknown. Hypothesizing that this phenotypic difference could potentially be represented by a molecular marker, we compared glycoprotein and matrix (M genes from SSPE cases with those from the Moraten vaccine strain, searching for differential structural motifs. We observed that all known SSPE viruses have residues P64, E89, and A209 (PEA in their M proteins whereas the equivalent residues for vaccine strains are either S64, K89, and T209 (SKT as in Moraten or PKT. Through the construction of MeV recombinants, we have obtained evidence that the wt MeV-M protein PEA motif, in particular A209, is linked to increased viral spread. Importantly, for the 10 wt genotypes (of 23 that have had their M proteins sequenced, 9 have the PEA motif, the exception being B3, which has PET. Interestingly, cases of SSPE caused by genotype B3 have yet to be reported. In conclusion, our results strongly suggest that the PEA motif is a molecular marker for wt MeV at risk to cause SSPE.

  18. TRDistiller: a rapid filter for enrichment of sequence datasets with proteins containing tandem repeats.

    Science.gov (United States)

    Richard, François D; Kajava, Andrey V

    2014-06-01

    The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Two Novel Motifs of Watermelon Silver Mottle Virus NSs Protein Are Responsible for RNA Silencing Suppression and Pathogenicity.

    Science.gov (United States)

    Huang, Chung-Hao; Hsiao, Weng-Rong; Huang, Ching-Wen; Chen, Kuan-Chun; Lin, Shih-Shun; Chen, Tsung-Chi; Raja, Joseph A J; Wu, Hui-Wen; Yeh, Shyi-Dong

    2015-01-01

    The NSs protein of Watermelon silver mottle virus (WSMoV) is the RNA silencing suppressor and pathogenicity determinant. In this study, serial deletion and point-mutation mutagenesis of conserved regions (CR) of NSs protein were performed, and the silencing suppression function was analyzed through agroinfiltration in Nicotiana benthamiana plants. We found two amino acid (aa) residues, H113 and Y398, are novel functional residues for RNA silencing suppression. Our further analyses demonstrated that H113 at the common epitope (CE) ((109)KFTMHNQ(117)), which is highly conserved in Asia type tospoviruses, and the benzene ring of Y398 at the C-terminal β-sheet motif ((397)IYFL(400)) affect NSs mRNA stability and protein stability, respectively, and are thus critical for NSs RNA silencing suppression. Additionally, protein expression of other six deleted (ΔCR1-ΔCR6) and five point-mutated (Y15A, Y27A, G180A, R181A and R212A) mutants were hampered and their silencing suppression ability was abolished. The accumulation of the mutant mRNAs and proteins, except Y398A, could be rescued or enhanced by co-infiltration with potyviral suppressor HC-Pro. When assayed with the attenuated Zucchini yellow mosaic virus vector in squash plants, the recombinants carrying individual seven point-mutated NSs proteins displayed symptoms much milder than the recombinant carrying the wild type NSs protein, suggesting that these aa residues also affect viral pathogenicity by suppressing the host silencing mechanism.

  20. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  1. MannDB – A microbial database of automated protein sequence analyses and evidence integration for protein characterization

    Directory of Open Access Journals (Sweden)

    Kuczmarski Thomas A

    2006-10-01

    Full Text Available Abstract Background MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. Description MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. Conclusion MannDB comprises a large number of genomes and comprehensive protein

  2. 14-3-3 checkpoint regulatory proteins interact specifically with DNA repair protein human exonuclease 1 (hEXO1) via a semi-conserved motif

    DEFF Research Database (Denmark)

    Andersen, Sofie Dabros; Keijzers, Guido; Rampakakis, Emmanouil

    2012-01-01

    Human exonuclease 1 (hEXO1) acts directly in diverse DNA processing events, including replication, mismatch repair (MMR), and double strand break repair (DSBR), and it was also recently described to function as damage sensor and apoptosis inducer following DNA damage. In contrast, 14-3-3 proteins...... are specifically induced by replication inhibition leading to protein ubiquitination and degradation. We demonstrate direct and robust interaction between hEXO1 and six of the seven 14-3-3 isoforms in vitro, suggestive of a novel protein interaction network between DNA repair and cell cycle control. Binding...... and most likely a second unidentified binding motif. 14-3-3 associations do not appear to directly influence hEXO1 in vitro nuclease activity or in vitro DNA replication initiation. Moreover, specific phosphorylation variants, including hEXO1 S746A, are efficiently imported to the nucleus; to associate...

  3. Spectrometric study of the folding process of i-motif-forming DNA sequences upstream of the c-kit transcription initiation site

    International Nuclear Information System (INIS)

    Bucek, Pavel; Gargallo, Raimundo; Kudrev, Andrei

    2010-01-01

    The c-kit oncogene shows a cytosine-rich DNA region upstream of the transcription initiation site which forms an i-motif structure at slightly acidic pH values (Bucek et al. ). In the present study, the pH-induced formation of i-motif - forming sequences 5'-CCC CTC CCT CGC GCC CGC CCG-3' (ckitC1, native), 5'-CCC TTC CCT TGT GCC CGC CCG-3' (ckitC2) and 5'-CCCTT CCC TTTTT CCC T CCC T-3' (ckitC3) was studied by spectroscopic techniques, such as UV molecular absorption and circular dichroism (CD), in tandem with two multivariate data analysis methods, the hard modelling-based matrix method and the soft modelling-based MCR-ALS approach. Use of the hard chemical modelling enabled us to propose the equilibrium model, which describes spectral changes as functions of solution acidity. Additionally, the intrinsic protonation constant, K in , and the cooperativity parameters, ω c , and ω a , were calculated from the fitting procedure of the coupled CD and molecular absorption spectra. In the case of ckitC2 and ckitC3, the hard model correctly reproduced the spectral variations observed experimentally. The results indicated that folding was accompanied by a cooperative process, i.e. the enhancement of protonated structure stability upon protonation. In contrast, unfolding was accompanied by an anticooperative process. Finally, folding of the native sequence, ckitC1, seemed to follow a more complex mechanism.

  4. Use of designed sequences in protein structure recognition.

    Science.gov (United States)

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  5. Proteomic profiling of human keratinocytes undergoing UVB-induced alternative differentiation reveals TRIpartite Motif Protein 29 as a survival factor.

    Directory of Open Access Journals (Sweden)

    Véronique Bertrand-Vallery

    Full Text Available BACKGROUND: Repeated exposures to UVB of human keratinocytes lacking functional p16(INK-4a and able to differentiate induce an alternative state of differentiation rather than stress-induced premature senescence. METHODOLOGY/PRINCIPAL FINDINGS: A 2D-DIGE proteomic profiling of this alternative state of differentiation was performed herein at various times after the exposures to UVB. Sixty-nine differentially abundant protein species were identified by mass spectrometry, many of which are involved in keratinocyte differentiation and survival. Among these protein species was TRIpartite Motif Protein 29 (TRIM29. Increased abundance of TRIM29 following UVB exposures was validated by Western blot using specific antibody and was also further analysed by immunochemistry and by RT-PCR. TRIM29 was found very abundant in keratinocytes and reconstructed epidermis. Knocking down the expression of TRIM29 by short-hairpin RNA interference decreased the viability of keratinocytes after UVB exposure. The abundance of involucrin mRNA, a marker of late differentiation, increased concomitantly. In TRIM29-knocked down reconstructed epidermis, the presence of picnotic cells revealed cell injury. Increased abundance of TRIM29 was also observed upon exposure to DNA damaging agents and PKC activation. The UVB-induced increase of TRIM29 abundance was dependent on a PKC signaling pathway, likely PKCdelta. CONCLUSIONS/SIGNIFICANCE: These findings suggest that TRIM29 allows keratinocytes to enter a protective alternative differentiation process rather than die massively after stress.

  6. Design and evaluation of antimalarial peptides derived from prediction of short linear motifs in proteins related to erythrocyte invasion.

    Directory of Open Access Journals (Sweden)

    Alessandra Bianchin

    Full Text Available The purpose of this study was to investigate the blood stage of the malaria causing parasite, Plasmodium falciparum, to predict potential protein interactions between the parasite merozoite and the host erythrocyte and design peptides that could interrupt these predicted interactions. We screened the P. falciparum and human proteomes for computationally predicted short linear motifs (SLiMs in cytoplasmic portions of transmembrane proteins that could play roles in the invasion of the erythrocyte by the merozoite, an essential step in malarial pathogenesis. We tested thirteen peptides predicted to contain SLiMs, twelve of them palmitoylated to enhance membrane targeting, and found three that blocked parasite growth in culture by inhibiting the initiation of new infections in erythrocytes. Scrambled peptides for two of the most promising peptides suggested that their activity may be reflective of amino acid properties, in particular, positive charge. However, one peptide showed effects which were stronger than those of scrambled peptides. This was derived from human red blood cell glycophorin-B. We concluded that proteome-wide computational screening of the intracellular regions of both host and pathogen adhesion proteins provides potential lead peptides for the development of anti-malarial compounds.

  7. Partial sequence determination of metabolically labeled radioactive proteins and peptides

    International Nuclear Information System (INIS)

    Anderson, C.W.

    1982-01-01

    The author has used the sequence analysis of radioactive proteins and peptides to approach several problems during the past few years. They, in collaboration with others, have mapped precisely several adenovirus proteins with respect to the nucleotide sequence of the adenovirus genome; identified hitherto missed proteins encoded by bacteriophage MS2 and by simian virus 40; analyzed the aminoterminal maturation of several virus proteins; determined the cleavage sites for processing of the poliovirus polyprotein; and analyzed the mechanism of frameshifting by excess normal tRNAs during cell-free protein synthesis. This chapter is designed to aid those without prior experience at protein sequence determinations. It is based primarily on the experience gained in the studies cited above, which made use of the Beckman 890 series automated protein sequencers

  8. Complete cDNA sequence coding for human docking protein

    Energy Technology Data Exchange (ETDEWEB)

    Hortsch, M; Labeit, S; Meyer, D I

    1988-01-11

    Docking protein (DP, or SRP receptor) is a rough endoplasmic reticulum (ER)-associated protein essential for the targeting and translocation of nascent polypeptides across this membrane. It specifically interacts with a cytoplasmic ribonucleoprotein complex, the signal recognition particle (SRP). The nucleotide sequence of cDNA encoding the entire human DP and its deduced amino acid sequence are given.

  9. Structural and sequence analysis of imelysin-like proteins implicated in bacterial iron uptake.

    Directory of Open Access Journals (Sweden)

    Qingping Xu

    Full Text Available Imelysin-like proteins define a superfamily of bacterial proteins that are likely involved in iron uptake. Members of this superfamily were previously thought to be peptidases and were included in the MEROPS family M75. We determined the first crystal structures of two remotely related, imelysin-like proteins. The Psychrobacter arcticus structure was determined at 2.15 Å resolution and contains the canonical imelysin fold, while higher resolution structures from the gut bacteria Bacteroides ovatus, in two crystal forms (at 1.25 Å and 1.44 Å resolution, have a circularly permuted topology. Both structures are highly similar to each other despite low sequence similarity and circular permutation. The all-helical structure can be divided into two similar four-helix bundle domains. The overall structure and the GxHxxE motif region differ from known HxxE metallopeptidases, suggesting that imelysin-like proteins are not peptidases. A putative functional site is located at the domain interface. We have now organized the known homologous proteins into a superfamily, which can be separated into four families. These families share a similar functional site, but each has family-specific structural and sequence features. These results indicate that imelysin-like proteins have evolved from a common ancestor, and likely have a conserved function.

  10. Comparative sequence analysis of acid sensitive/resistance proteins in Escherichia coli and Shigella flexneri

    Science.gov (United States)

    Manikandan, Selvaraj; Balaji, Seetharaaman; Kumar, Anil; Kumar, Rita

    2007-01-01

    The molecular basis for the survival of bacteria under extreme conditions in which growth is inhibited is a question of great current interest. A preliminary study was carried out to determine residue pattern conservation among the antiporters of enteric bacteria, responsible for extreme acid sensitivity especially in Escherichia coli and Shigella flexneri. Here we found the molecular evidence that proved the relationship between E. coli and S. flexneri. Multiple sequence alignment of the gadC coded acid sensitive antiporter showed many conserved residue patterns at regular intervals at the N-terminal region. It was observed that as the alignment approaches towards the C-terminal, the number of conserved residues decreases, indicating that the N-terminal region of this protein has much active role when compared to the carboxyl terminal. The motif, FHLVFFLLLGG, is well conserved within the entire gadC coded protein at the amino terminal. The motif is also partially conserved among other antiporters (which are not coded by gadC) but involved in acid sensitive/resistance mechanism. Phylogenetic cluster analysis proves the relationship of Escherichia coli and Shigella flexneri. The gadC coded proteins are converged as a clade and diverged from other antiporters belongs to the amino acid-polyamine-organocation (APC) superfamily. PMID:21670792

  11. AlignMe—a membrane protein sequence alignment web server

    Science.gov (United States)

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  12. MIPS: a database for protein sequences and complete genomes.

    Science.gov (United States)

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  13. Dynamics of domain coverage of the protein sequence universe

    Science.gov (United States)

    2012-01-01

    Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439

  14. Dynamics of domain coverage of the protein sequence universe

    Directory of Open Access Journals (Sweden)

    Rekapalli Bhanu

    2012-11-01

    Full Text Available Abstract Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data.

  15. Role of N-glycosylation sites and CXC motifs in trafficking of Medicago trunculata Nod Factor Perception protein to the plasma membrane.

    NARCIS (Netherlands)

    Lefebvre, B.; Klaus-Heisen, D.; Pietraszewska-Bogiel, A.; Hervé, M.; Camut, S.; Auriac, M.C.; Gasciolli, V.; Nurisso, A.; Gadella, T.W.; Cullimore, J.

    2012-01-01

    The lysin motif receptor like kinase, NFP, is a key protein in the legume Medicago truncatula for the perception of lipochitooligosaccharidic Nod Factors, which are secreted bacterial signals essential for establishing the nitrogen-fixing legume-rhizobia symbiosis. Predicted structural and genetic

  16. Divergent protein motifs direct elongation factor P-mediated translational regulation in Salmonella enterica and Escherichia coli

    DEFF Research Database (Denmark)

    Hersch, Steven J; Wang, Mengchi; Zou, S Betty

    2013-01-01

    number of proteins are affected by the loss of EF-P, and it has recently been determined that EF-P plays a critical role in rescuing ribosomes stalled at PPP and PPG peptide sequences. Here we present an unbiased in vivo investigation of the specific targets of EF-P by employing stable isotope labeling...

  17. [Tripartite motif-containing protein 34 (TRIM34) colocalized with micronuclei chromosome and hampers its movement to equatorial plate during the metaphase stage of mitosis].

    Science.gov (United States)

    Sun, Dakang; An, Xinye; Ji, Bing; Cheng, Yanli; Gao, Honglian; Tian, Mingming

    2016-06-01

    Objective To examine whether tripartite motif-containing protein 34 (TRIM34) is colocalized with micronuclei and investigate the influence on the movement of micronuclei chromosome in mitosis. Methods The eukaryotic expression vector TRIM34-pEGFP-N3 was constructed, identified and then transfected into HEK293T cells. With 4', 6-diamidino-2-phenylindole 2HCI (DAPI) staining, the colocalization between TRIM34 and micronuclei was observed under a fluorescence microscope. Moreover, MitoTracker(R)Deep Red was used to identify the colocalization between the complex of TRIM34-micronulei and mitochondria under a confocal microscope. Finally, the effect of TRIM34 on the movement of micronuclei chromosome in mitosis was examined. Results DNA sequencing confirmed that the vector TRIM34-pEGFP-N3 was constructed successfully. A fluorescence microscope revealed that TRIM34 could be colocalized with micronuclei in HEK293T cells transfected with TRIM34-pEGFP-N3. In the same manner, a confocal microscope distinctly showed that TRIM34 was colocalized with micronuclei similarly in appearance. However, there was no distinguished colocalization relationship between the complex of TRIM34-micronulei and mitochondria. Interestingly, the micronuclei chromosome conjugated with TRIM34 was hardly transferred to equatorial plate during the metaphase stage of mitosis. Conclusion TRIM34 is colocalized with micronuclei chromosome and hampers its movement to equatorial plate in mitosis.

  18. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    Science.gov (United States)

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  19. Structural and Functional Analysis of VQ Motif-Containing Proteins in Arabidopsis as Interacting Proteins of WRKY Transcription Factors1[W][OA

    Science.gov (United States)

    Cheng, Yuan; Zhou, Yuan; Yang, Yan; Chi, Ying-Jun; Zhou, Jie; Chen, Jian-Ye; Wang, Fei; Fan, Baofang; Shi, Kai; Zhou, Yan-Hong; Yu, Jing-Quan; Chen, Zhixiang

    2012-01-01

    WRKY transcription factors are encoded by a large gene superfamily with a broad range of roles in plants. Recently, several groups have reported that proteins containing a short VQ (FxxxVQxLTG) motif interact with WRKY proteins. We have recently discovered that two VQ proteins from Arabidopsis (Arabidopsis thaliana), SIGMA FACTOR-INTERACTING PROTEIN1 and SIGMA FACTOR-INTERACTING PROTEIN2, act as coactivators of WRKY33 in plant defense by specifically recognizing the C-terminal WRKY domain and stimulating the DNA-binding activity of WRKY33. In this study, we have analyzed the entire family of 34 structurally divergent VQ proteins from Arabidopsis. Yeast (Saccharomyces cerevisiae) two-hybrid assays showed that Arabidopsis VQ proteins interacted specifically with the C-terminal WRKY domains of group I and the sole WRKY domains of group IIc WRKY proteins. Using site-directed mutagenesis, we identified structural features of these two closely related groups of WRKY domains that are critical for interaction with VQ proteins. Quantitative reverse transcription polymerase chain reaction revealed that expression of a majority of Arabidopsis VQ genes was responsive to pathogen infection and salicylic acid treatment. Functional analysis using both knockout mutants and overexpression lines revealed strong phenotypes in growth, development, and susceptibility to pathogen infection. Altered phenotypes were substantially enhanced through cooverexpression of genes encoding interacting VQ and WRKY proteins. These findings indicate that VQ proteins play an important role in plant growth, development, and response to environmental conditions, most likely by acting as cofactors of group I and IIc WRKY transcription factors. PMID:22535423

  20. Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation

    Directory of Open Access Journals (Sweden)

    Wang Yong

    2011-10-01

    Full Text Available Abstract Background With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. Results In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. Conclusions By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.

  1. The B7-1 cytoplasmic tail enhances intracellular transport and mammalian cell surface display of chimeric proteins in the absence of a linear ER export motif.

    Directory of Open Access Journals (Sweden)

    Yi-Chieh Lin

    Full Text Available Membrane-tethered proteins (mammalian surface display are increasingly being used for novel therapeutic and biotechnology applications. Maximizing surface expression of chimeric proteins on mammalian cells is important for these applications. We show that the cytoplasmic domain from the B7-1 antigen, a commonly used element for mammalian surface display, can enhance the intracellular transport and surface display of chimeric proteins in a Sar1 and Rab1 dependent fashion. However, mutational, alanine scanning and deletion analysis demonstrate the absence of linear ER export motifs in the B7 cytoplasmic domain. Rather, efficient intracellular transport correlated with the presence of predicted secondary structure in the cytoplasmic tail. Examination of the cytoplasmic domains of 984 human and 782 mouse type I transmembrane proteins revealed that many previously identified ER export motifs are rarely found in the cytoplasmic tail of type I transmembrane proteins. Our results suggest that efficient intracellular transport of B7 chimeric proteins is associated with the structure rather than to the presence of a linear ER export motif in the cytoplasmic tail, and indicate that short (less than ~ 10-20 amino acids and unstructured cytoplasmic tails should be avoided to express high levels of chimeric proteins on mammalian cells.

  2. Nonlinear analysis of sequence repeats of multi-domain proteins

    Energy Technology Data Exchange (ETDEWEB)

    Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: lmf_bill@sina.com

    2007-11-15

    Many multi-domain proteins have repetitive three-dimensional structures but nearly-random amino acid sequences. In the present paper, by using a modified recurrence plot proposed by us previously, we show that these amino acid sequences have hidden repetitions in fact. These results indicate that the repetitive domain structures are encoded by the repetitive sequences. This also gives a method to detect the repetitive domain structures directly from amino acid sequences.

  3. Insights into the Activity and Substrate Binding of Xylella fastidiosa Polygalacturonase by Modification of a Unique QMK Amino Acid Motif Using Protein Chimeras.

    Science.gov (United States)

    Warren, Jeremy G; Lincoln, James E; Kirkpatrick, Bruce C

    2015-01-01

    Polygalacturonases (EC 3.2.1.15) catalyze the random hydrolysis of 1, 4-alpha-D-galactosiduronic linkages in pectate and other galacturonans. Xylella fastidiosa possesses a single polygalacturonase gene, pglA (PD1485), and X. fastidiosa mutants deficient in the production of polygalacturonase are non-pathogenic and show a compromised ability to systemically infect grapevines. These results suggested that grapevines expressing sufficient amounts of an inhibitor of X. fastidiosa polygalacturonase might be protected from disease. Previous work in our laboratory and others have tried without success to produce soluble active X. fastidiosa polygalacturonase for use in inhibition assays. In this study, we created two enzymatically active X. fastidiosa / A. vitis polygalacturonase chimeras, AX1A and AX2A to explore the functionality of X. fastidiosa polygalacturonase in vitro. The AX1A chimera was constructed to specifically test if recombinant chimeric protein, produced in Escherichia coli, is soluble and if the X. fastidiosa polygalacturonase catalytic amino acids are able to hydrolyze polygalacturonic acid. The AX2A chimera was constructed to evaluate the ability of a unique QMK motif of X. fastidiosa polygalacturonase, most polygalacturonases have a R(I/L)K motif, to bind to and allow the hydrolysis of polygalacturonic acid. Furthermore, the AX2A chimera was also used to explore what effect modification of the QMK motif of X. fastidiosa polygalacturonase to a conserved RIK motif has on enzymatic activity. These experiments showed that both the AX1A and AX2A polygalacturonase chimeras were soluble and able to hydrolyze the polygalacturonic acid substrate. Additionally, the modification of the QMK motif to the conserved RIK motif eliminated hydrolytic activity, suggesting that the QMK motif is important for the activity of X. fastidiosa polygalacturonase. This result suggests X. fastidiosa polygalacturonase may preferentially hydrolyze a different pectic substrate or

  4. Identification and characterization of two linear epitope motifs in hepatitis E virus ORF2 protein.

    Directory of Open Access Journals (Sweden)

    Heng Wang

    Full Text Available Hepatitis E virus (HEV is responsible for hepatitis E, which represents a global public health problem. HEV genotypes 3 and 4 are reported to be zoonotic, and animals are monitored for HEV infection in the interests of public hygiene and food safety. The development of novel diagnostic methods and vaccines for HEV in humans is thus important topics of research. Opening reading frame (ORF 2 of HEV includes both linear and conformational epitopes and is regarded as the primary candidate for vaccines and diagnostic tests. We investigated the precise location of the HEV epitopes in the ORF2 protein. We prepared four monoclonal antibodies (mAbs against genotype 4 ORF2 protein and identified two linear epitopes, G438IVIPHD444 and Y457DNQH461, corresponding to two of these mAbs using phage display biopanning technology. Both these epitopes were speculated to be universal to genotypes 1, 2, 3, 4, and avian HEVs. We also used two 12-mer fragments of ORF2 protein including these two epitopes to develop a peptide-based enzyme-linked immunosorbent assay (ELISA to detect HEV in serum. This assay demonstrated good specificity but low sensitivity compared with the commercial method, indicating that these two epitopes could serve as potential candidate targets for diagnosis. Overall, these results further our understanding of the epitope distribution of HEV ORF2, and provide important information for the development of peptide-based immunodiagnostic tests to detect HEV in serum.

  5. The structure of Plasmodium vivax phosphatidylethanolamine-binding protein suggests a functional motif containing a left-handed helix

    International Nuclear Information System (INIS)

    Arakaki, Tracy; Neely, Helen; Boni, Erica; Mueller, Natasha; Buckner, Frederick S.; Van Voorhis, Wesley C.; Lauricella, Angela; DeTitta, George; Luft, Joseph; Hol, Wim G. J.; Merritt, Ethan A.

    2007-01-01

    The crystal structure of a phosphatidylethanolamine-binding protein from P. vivax, a homolog of Raf-kinase inhibitor protein (RKIP), has been solved to a resolution of 1.3 Å. The inferred interaction surface near the anion-binding site is found to include a distinctive left-handed α-helix. The structure of a putative Raf kinase inhibitor protein (RKIP) homolog from the eukaryotic parasite Plasmodium vivax has been studied to a resolution of 1.3 Å using multiple-wavelength anomalous diffraction at the Se K edge. This protozoan protein is topologically similar to previously studied members of the phosphatidylethanolamine-binding protein (PEBP) sequence family, but exhibits a distinctive left-handed α-helical region at one side of the canonical phospholipid-binding site. Re-examination of previously determined PEBP structures suggests that the P. vivax protein and yeast carboxypeptidase Y inhibitor may represent a structurally distinct subfamily of the diverse PEBP-sequence family

  6. An algorithm to find all palindromic sequences in proteins

    Indian Academy of Sciences (India)

    2013-01-20

    Jan 20, 2013 ... 1976; Karrer and Gall 1976; Vogt and Braun 1976) and (iii) in the formation of hairpin loops in the newly transcribed RNA. Palindromic sequences are observed in various classes of proteins like histones (Cheng et al. 1989), prion proteins (Sulkowski 1992; Kazim 1993),. DNA-binding proteins (Suzuki 1992; ...

  7. Discriminating Microbial Species Using Protein Sequence Properties and Machine Learning

    NARCIS (Netherlands)

    Shahib, Ali Al-; Gilbert, David; Breitling, Rainer

    2007-01-01

    Much work has been done to identify species-specific proteins in sequenced genomes and hence to determine their function. We assumed that such proteins have specific physico-chemical properties that will discriminate them from proteins in other species. In this paper, we examine the validity of this

  8. Molecular cloning and expression of a transformation-sensitive human protein containing the TPR motif and sharing identity to the stress-inducible yeast protein STI1

    DEFF Research Database (Denmark)

    Honoré, B; Leffers, H; Madsen, Peder

    1992-01-01

    in families of fungal proteins required for mitosis and RNA synthesis. In particular, the protein has 42% amino acid sequence identity to STI1, a stress-inducible mediator of the heat shock response in Saccharomyces cerevisiae. Northern blot analysis indicated that the 3521 mRNA is up-regulated in several...

  9. Improving pairwise comparison of protein sequences with domain co-occurrence

    Science.gov (United States)

    Gascuel, Olivier

    2018-01-01

    Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence PMID:29293498

  10. Human ribosomal protein L37 has motifs predicting serine/threonine phosphorylation and a zinc-finger domain.

    Science.gov (United States)

    Barnard, G F; Staniunas, R J; Puder, M; Steele, G D; Chen, L B

    1994-08-02

    Ribosomal protein L37 mRNA is overexpressed in colon cancer. The nucleotide sequences of human L37 from several tumor and normal, colon and liver cDNA sources were determined to be identical. L37 mRNA was approximately 375 nucleotides long encoding 97 amino acids with M(r) = 11,070, pI = 12.6, multiple potential serine/threonine phosphorylation sites and a zinc-finger domain. The human sequence is compared to other species.

  11. Identification of the bioactive and consensus peptide motif from Momordica charantia insulin receptor-binding protein.

    Science.gov (United States)

    Lo, Hsin-Yi; Li, Chia-Cheng; Ho, Tin-Yun; Hsiang, Chien-Yun

    2016-08-01

    Many food bioactive peptides with diverse functions have been discovered by studying plant proteins. We have previously identified a 68-residue insulin receptor (IR)-binding protein (mcIRBP) from Momordica charantia that exhibits hypoglycemic effects in mice via interaction with IR. By in vitro digestion, we found that mcIRBP-19, spanning residues 50-68 of mcIRBP, enhanced the binding of insulin to IR, stimulated the phosphorylation of PDK1 and Akt, induced the expression of glucose transporter 4, and stimulated both the uptake of glucose in cells and the clearance of glucose in diabetic mice. Furthermore, mcIRBP-19 homologs were present in various plants and shared similar β-hairpin structures and IR kinase-activating abilities to mcIRBP-19. In conclusion, our findings suggested that mcIRBP-19 is a blood glucose-lowering bioactive peptide that exhibits IR-binding potentials. Moreover, we newly identified novel IR-binding bioactive peptides in various plants which belonged to different taxonomic families. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Automatic discovery of cross-family sequence features associated with protein function

    Directory of Open Access Journals (Sweden)

    Krings Andrea

    2006-01-01

    Full Text Available Abstract Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for

  13. Large-scale identification of odorant-binding proteins and chemosensory proteins from expressed sequence tags in insects

    Science.gov (United States)

    2009-01-01

    Background Insect odorant binding proteins (OBPs) and chemosensory proteins (CSPs) play an important role in chemical communication of insects. Gene discovery of these proteins is a time-consuming task. In recent years, expressed sequence tags (ESTs) of many insect species have accumulated, thus providing a useful resource for gene discovery. Results We have developed a computational pipeline to identify OBP and CSP genes from insect ESTs. In total, 752,841 insect ESTs were examined from 54 species covering eight Orders of Insecta. From these ESTs, 142 OBPs and 177 CSPs were identified, of which 117 OBPs and 129 CSPs are new. The complete open reading frames (ORFs) of 88 OBPs and 123 CSPs were obtained by electronic elongation. We randomly chose 26 OBPs from eight species of insects, and 21 CSPs from four species for RT-PCR validation. Twenty two OBPs and 16 CSPs were confirmed by RT-PCR, proving the efficiency and reliability of the algorithm. Together with all family members obtained from the NCBI (OBPs) or the UniProtKB (CSPs), 850 OBPs and 237 CSPs were analyzed for their structural characteristics and evolutionary relationship. Conclusions A large number of new OBPs and CSPs were found, providing the basis for deeper understanding of these proteins. In addition, the conserved motif and evolutionary analysis provide some new insights into the evolution of insect OBPs and CSPs. Motif pattern fine-tune the functions of OBPs and CSPs, leading to the minor difference in binding sex pheromone or plant volatiles in different insect Orders. PMID:20034407

  14. The relationship of protein conservation and sequence length

    Directory of Open Access Journals (Sweden)

    Panchenko Anna R

    2002-11-01

    Full Text Available Abstract Background In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints. Results We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds. Conclusions There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints.

  15. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

    Science.gov (United States)

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-11-16

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins.

    Science.gov (United States)

    Foulk, Michael S; Urban, John M; Casella, Cinzia; Gerbi, Susan A

    2015-05-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na(+) instead of K(+) in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq. © 2015 Foulk et al.; Published by Cold Spring Harbor Laboratory Press.

  17. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  18. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps

    Science.gov (United States)

    Drory Retwitzer, Matan; Polishchuk, Maya; Churkin, Elena; Kifer, Ilona; Yakhini, Zohar; Barash, Danny

    2015-01-01

    Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular—no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site. PMID:25940619

  19. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA.

    Science.gov (United States)

    Mitrea, Diana M; Cika, Jaclyn A; Guy, Clifford S; Ban, David; Banerjee, Priya R; Stanley, Christopher B; Nourse, Amanda; Deniz, Ashok A; Kriwacki, Richard W

    2016-02-02

    The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus.

  20. The LXCXE Retinoblastoma Protein-Binding Motif of FOG-2 Regulates Adipogenesis.

    Science.gov (United States)

    Goupille, Olivier; Penglong, Tipparat; Kadri, Zahra; Granger-Locatelli, Marine; Denis, Raphaël; Luquet, Serge; Badoual, Cécile; Fucharoen, Suthat; Maouche-Chrétien, Leila; Leboulch, Philippe; Chrétien, Stany

    2017-12-19

    GATA transcription factors and their FOG cofactors play a key role in tissue-specific development and differentiation, from worms to humans. Mammals have six GATA and two FOG factors. We recently demonstrated that interactions between retinoblastoma protein (pRb) and GATA-1 are crucial for erythroid proliferation and differentiation. We show here that the LXCXE pRb-binding site of FOG-2 is involved in adipogenesis. Unlike GATA-1, which inhibits cell division, FOG-2 promotes proliferation. Mice with a knockin of a Fog2 gene bearing a mutated LXCXE pRb-binding site are resistant to obesity and display higher rates of white-to-brown fat conversion. Thus, each component of the GATA/FOG complex (GATA-1 and FOG-2) is involved in pRb/E2F regulation, but these molecules have markedly different roles in the control of tissue homeostasis. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  1. Inverse statistical physics of protein sequences: a key issues review.

    Science.gov (United States)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  2. Cations form sequence selective motifs within DNA grooves via a combination of cation-pi and ion-dipole/hydrogen bond interactions.

    Science.gov (United States)

    Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori

    2013-01-01

    The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl⁺) and the polarized first hydration shell waters of divalent cations (Mg²⁺, Ca²⁺) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves.

  3. Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

    KAUST Repository

    Alam, Tanvir

    2018-03-11

    Short Linear Motifs (SLiMs) contribute to almost every cellular function by connecting appropriate protein partners. Accurate prediction of SLiMs is difficult due to their shortness and sequence degeneracy. Leucine-aspartic acid (LD) motifs are SLiMs that link paxillin family proteins to factors controlling (cancer) cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. To enable a proteome-wide assessment of these motifs, we developed an active-learning based framework that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome identified a dozen proteins that contain LD motifs, all being involved in cell adhesion and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter-species comparison revealed a conserved LD signalling core, and reveals the emergence of species-specific adaptive connections, while maintaining a strong functional focus of the LD motif interactome. Collectively, our data elucidate the mechanisms underlying the origin and adaptation of an ancestral SLiM.

  4. The SWISS-PROT protein sequence data bank

    OpenAIRE

    Bairoch, Amos; Boeckmann, Brigitte

    1992-01-01

    SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library

  5. Aligning protein sequence and analysing substitution pattern using ...

    Indian Academy of Sciences (India)

    Prakash

    Aligning protein sequences using a score matrix has became a routine but valuable method in modern biological ..... the amino acids according to their substitution behaviour ...... which may cause great change (e.g. prolonging the helix) in.

  6. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  7. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage...

  8. Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment

    Directory of Open Access Journals (Sweden)

    Daniels Noah M

    2012-10-01

    Full Text Available Abstract Background The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. Results We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. Conclusions Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.

  9. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  10. Protein 3D structure computed from evolutionary sequence variation.

    Directory of Open Access Journals (Sweden)

    Debora S Marks

    Full Text Available The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org. This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of

  11. A phosphorylation-motif for tuneable helix stabilisation in intrinsically disordered proteins - Lessons from the sodium proton exchanger 1 (NHE1)

    DEFF Research Database (Denmark)

    Hendus-Altenburger, Ruth; Lambrughi, Matteo; Terkelsen, Thilde Bagger

    2017-01-01

    ). Using NMR spectroscopy, we found that two out of those six phosphorylation sites had a stabilizing effect on transient helices. One of these was further investigated by circular dichroism and NMR spectroscopy as well as by molecular dynamic simulations, which confirmed the stabilizing effect......-spread role in phosphorylation-mediated regulation of intrinsically disordered proteins. The identification of such motifs is important for understanding the molecular mechanism of cellular signalling, and is crucial for the development of predictors for the structural effect of phosphorylation; a tool......Intrinsically disordered proteins (IDPs) are involved in many pivotal cellular processes including phosphorylation and signalling. The structural and functional effects of phosphorylation of IDPs remain poorly understood and difficult to predict. Thus, a need exists to identify motifs that confer...

  12. Efficient motif finding algorithms for large-alphabet inputs

    Directory of Open Access Journals (Sweden)

    Pavlovic Vladimir

    2010-10-01

    Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

  13. A sequence in subdomain 2 of DBL1α of Plasmodium falciparum erythrocyte membrane protein 1 induces strain transcending antibodies.

    Directory of Open Access Journals (Sweden)

    Karin Blomqvist

    Full Text Available Immunity to severe malaria is the first level of immunity acquired to Plasmodium falciparum. Antibodies to the variant antigen PfEMP1 (P. falciparum erythrocyte membrane protein 1 present at the surface of the parasitized red blood cell (pRBC confer protection by blocking microvascular sequestration. Here we have generated antibodies to peptide sequences of subdomain 2 of PfEMP1-DBL1α previously identified to be associated with severe or mild malaria. A set of sera generated to the amino acid sequence KLQTLTLHQVREYWWALNRKEVWKA, containing the motif ALNRKE, stained the live pRBC. 50% of parasites tested (7/14 were positive both in flow cytometry and immunofluorescence assays with live pRBCs including both laboratory strains and in vitro adapted clinical isolates. Antibodies that reacted selectively with the sequence REYWWALNRKEVWKA in a 15-mer peptide array of DBL1α-domains were also found to react with the pRBC surface. By utilizing a peptide array to map the binding properties of the elicited anti-DBL1α antibodies, the amino acids WxxNRx were found essential for antibody binding. Complementary experiments using 135 degenerate RDSM peptide sequences obtained from 93 Ugandan patient-isolates showed that antibody binding occurred when the amino acids WxLNRKE/D were present in the peptide. The data suggests that the ALNRKE sequence motif, associated with severe malaria, induces strain-transcending antibodies that react with the pRBC surface.

  14. AMS 4.0: consensus prediction of post-translational modifications in protein sequences.

    Science.gov (United States)

    Plewczynski, Dariusz; Basu, Subhadip; Saha, Indrajit

    2012-08-01

    We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The

  15. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Science.gov (United States)

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  16. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

    DEFF Research Database (Denmark)

    Foulk, M. S.; Urban, J. M.; Casella, Cinzia

    2015-01-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (lambda-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent...... strands intact. We used genomics and biochemical approaches to determine if lambda-exo digests all parental DNA sequences equally. We report that lambda-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, lambda-exo digestion of nonreplicating genomic DNA (LexoG0) enriches...... GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent lambda-exo biases in NSseq and validated this approach at the rDNA locus. The lambda-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s...

  17. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  18. Taxonomic colouring of phylogenetic trees of protein sequences

    Directory of Open Access Journals (Sweden)

    Andrade-Navarro Miguel A

    2006-02-01

    Full Text Available Abstract Background Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common. Results We have developed PhyloView, a web based tool for colouring phylogenetic trees upon arbitrary taxonomic properties of the species represented in a protein sequence phylogenetic tree. Provided that the tree contains SwissProt, SpTrembl, or GenBank protein identifiers, the tool retrieves the taxonomic information from the corresponding database. A colour picker displays a summary of the findings and allows the user to associate colours to the leaves of the tree according to any number of taxonomic partitions. Then, the colours are propagated to the branches of the tree. Conclusion PhyloView can be used at http://www.ogic.ca/projects/phyloview/. A tutorial, the software with documentation, and GPL licensed source code, can be accessed at the same web address.

  19. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  20. The YPLGVG sequence of the Nipah virus matrix protein is required for budding

    Directory of Open Access Journals (Sweden)

    Yan Lianying

    2008-11-01

    Full Text Available Abstract Background Nipah virus (NiV is a recently emerged paramyxovirus capable of causing fatal disease in a broad range of mammalian hosts, including humans. Together with Hendra virus (HeV, they comprise the genus Henipavirus in the family Paramyxoviridae. Recombinant expression systems have played a crucial role in studying the cell biology of these Biosafety Level-4 restricted viruses. Henipavirus assembly and budding occurs at the plasma membrane, although the details of this process remain poorly understood. Multivesicular body (MVB proteins have been found to play a role in the budding of several enveloped viruses, including some paramyxoviruses, and the recruitment of MVB proteins by viral proteins possessing late budding domains (L-domains has become an important concept in the viral budding process. Previously we developed a system for producing NiV virus-like particles (VLPs and demonstrated that the matrix (M protein possessed an intrinsic budding ability and played a major role in assembly. Here, we have used this system to further explore the budding process by analyzing elements within the M protein that are critical for particle release. Results Using rationally targeted site-directed mutagenesis we show that a NiV M sequence YPLGVG is required for M budding and that mutation or deletion of the sequence abrogates budding ability. Replacement of the native and overlapping Ebola VP40 L-domains with the NiV sequence failed to rescue VP40 budding; however, it did induce the cellular morphology of extensive filamentous projection consistent with wild-type VP40-expressing cells. Cells expressing wild-type NiV M also displayed this morphology, which was dependent on the YPLGVG sequence, and deletion of the sequence also resulted in nuclear localization of M. Dominant-negative VPS4 proteins had no effect on NiV M budding, suggesting that unlike other viruses such as Ebola, NiV M accomplishes budding independent of MVB cellular proteins

  1. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  2. Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

    Science.gov (United States)

    Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

    2009-01-01

    As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.

  3. Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers

    NARCIS (Netherlands)

    Yousef, Malik; Nigatu, Dawit; Levy, Dalit; Allmer, Jens; Henkel, Werner

    2017-01-01

    Background: Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host

  4. Can Natural Proteins Designed with ‘Inverted’ Peptide Sequences Adopt Native-Like Protein Folds?

    Science.gov (United States)

    Sridhar, Settu; Guruprasad, Kunchur

    2014-01-01

    We have carried out a systematic computational analysis on a representative dataset of proteins of known three-dimensional structure, in order to evaluate whether it would possible to ‘swap’ certain short peptide sequences in naturally occurring proteins with their corresponding ‘inverted’ peptides and generate ‘artificial’ proteins that are predicted to retain native-like protein fold. The analysis of 3,967 representative proteins from the Protein Data Bank revealed 102,677 unique identical inverted peptide sequence pairs that vary in sequence length between 5–12 and 18 amino acid residues. Our analysis illustrates with examples that such ‘artificial’ proteins may be generated by identifying peptides with ‘similar structural environment’ and by using comparative protein modeling and validation studies. Our analysis suggests that natural proteins may be tolerant to accommodating such peptides. PMID:25210740

  5. Correlated mutations in protein sequences: Phylogenetic and structural effects

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.S. [Los Alamos National Lab., NM (United States). Theoretical Div.]|[Santa Fe Inst., NM (United States); Giraud, B.G. [C.E.N. Saclay, Gif/Yvette (France). Service Physique Theorique; Liu, L.C. [Los Alamos National Lab., NM (United States). Theoretical Div.; Stormo, G.D. [Univ. of Colorado, Boulder, CO (United States). Dept. of Molecular, Cellular and Developmental Biology

    1998-12-01

    Covariation analysis of sets of aligned sequences for RNA molecules is relatively successful in elucidating RNA secondary structure, as well as some aspects of tertiary structure. Covariation analysis of sets of aligned sequences for protein molecules is successful in certain instances in elucidating certain structural and functional links, but in general, pairs of sites displaying highly covarying mutations in protein sequences do not necessarily correspond to sites that are spatially close in the protein structure. In this paper the authors identify two reasons why naive use of covariation analysis for protein sequences fails to reliably indicate sequence positions that are spatially proximate. The first reason involves the bias introduced in calculation of covariation measures due to the fact that biological sequences are generally related by a non-trivial phylogenetic tree. The authors present a null-model approach to solve this problem. The second reason involves linked chains of covariation which can result in pairs of sites displaying significant covariation even though they are not spatially proximate. They present a maximum entropy solution to this classic problem of causation versus correlation. The methodologies are validated in simulation.

  6. Semi-Supervised Learning for Classification of Protein Sequence Data

    Directory of Open Access Journals (Sweden)

    Brian R. King

    2008-01-01

    Full Text Available Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.

  7. Single-molecule protein sequencing through fingerprinting: computational assessment

    Science.gov (United States)

    Yao, Yao; Docter, Margreet; van Ginkel, Jetty; de Ridder, Dick; Joo, Chirlmin

    2015-10-01

    Proteins are vital in all biological systems as they constitute the main structural and functional components of cells. Recent advances in mass spectrometry have brought the promise of complete proteomics by helping draft the human proteome. Yet, this commonly used protein sequencing technique has fundamental limitations in sensitivity. Here we propose a method for single-molecule (SM) protein sequencing. A major challenge lies in the fact that proteins are composed of 20 different amino acids, which demands 20 molecular reporters. We computationally demonstrate that it suffices to measure only two types of amino acids to identify proteins and suggest an experimental scheme using SM fluorescence. When achieved, this highly sensitive approach will result in a paradigm shift in proteomics, with major impact in the biological and medical sciences.

  8. Single-molecule protein sequencing through fingerprinting: computational assessment

    International Nuclear Information System (INIS)

    Yao, Yao; Docter, Margreet; Van Ginkel, Jetty; Joo, Chirlmin; De Ridder, Dick

    2015-01-01

    Proteins are vital in all biological systems as they constitute the main structural and functional components of cells. Recent advances in mass spectrometry have brought the promise of complete proteomics by helping draft the human proteome. Yet, this commonly used protein sequencing technique has fundamental limitations in sensitivity. Here we propose a method for single-molecule (SM) protein sequencing. A major challenge lies in the fact that proteins are composed of 20 different amino acids, which demands 20 molecular reporters. We computationally demonstrate that it suffices to measure only two types of amino acids to identify proteins and suggest an experimental scheme using SM fluorescence. When achieved, this highly sensitive approach will result in a paradigm shift in proteomics, with major impact in the biological and medical sciences. (paper)

  9. Deep sequencing methods for protein engineering and design.

    Science.gov (United States)

    Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A

    2017-08-01

    The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Sequence analysis reveals how G protein-coupled receptors transduce the signal to the G protein.

    NARCIS (Netherlands)

    Oliveira, L.; Paiva, P.B.; Paiva, A.C.; Vriend, G.

    2003-01-01

    Sequence entropy-variability plots based on alignments of very large numbers of sequences-can indicate the location in proteins of the main active site and modulator sites. In the previous article in this issue, we applied this observation to a series of well-studied proteins and concluded that it

  11. A novel RNA-recognition-motif protein is required for premeiotic G1/S-phase transition in rice (Oryza sativa L..

    Directory of Open Access Journals (Sweden)

    Ken-Ichi Nonomura

    2011-01-01

    Full Text Available The molecular mechanism for meiotic entry remains largely elusive in flowering plants. Only Arabidopsis SWI1/DYAD and maize AM1, both of which are the coiled-coil protein, are known to be required for the initiation of plant meiosis. The mechanism underlying the synchrony of male meiosis, characteristic to flowering plants, has also been unclear in the plant kingdom. In other eukaryotes, RNA-recognition-motif (RRM proteins are known to play essential roles in germ-cell development and meiosis progression. Rice MEL2 protein discovered in this study shows partial similarity with human proline-rich RRM protein, deleted in Azoospermia-Associated Protein1 (DAZAP1, though MEL2 also possesses ankyrin repeats and a RING finger motif. Expression analyses of several cell-cycle markers revealed that, in mel2 mutant anthers, most germ cells failed to enter premeiotic S-phase and meiosis, and a part escaped from the defect and underwent meiosis with a significant delay or continued mitotic cycles. Immunofluorescent detection revealed that T7 peptide-tagged MEL2 localized at cytoplasmic perinuclear region of germ cells during premeiotic interphase in transgenic rice plants. This study is the first report of the plant RRM protein, which is required for regulating the premeiotic G1/S-phase transition of male and female germ cells and also establishing synchrony of male meiosis. This study will contribute to elucidation of similarities and diversities in reproduction system between plants and other species.

  12. Binding properties of SUMO-interacting motifs (SIMs) in yeast.

    Science.gov (United States)

    Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

    2015-03-01

    Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.

  13. ATP-binding motifs play key roles in Krp1p, kinesin-related protein 1, function for bi-polar growth control in fission yeast

    International Nuclear Information System (INIS)

    Rhee, Dong Keun; Cho, Bon A; Kim, Hyong Bai

    2005-01-01

    Kinesin is a microtubule-based motor protein with various functions related to the cell growth and division. It has been reported that Krp1p, kinesin-related protein 1, which belongs to the kinesin heavy chain superfamily, localizes on microtubules and may play an important role in cytokinesis. However, the function of Krp1p has not been fully elucidated. In this study, we overexpressed an intact form and three different mutant forms of Krp1p in fission yeast constructed by site-directed mutagenesis in two ATP-binding motifs or by truncation of the leucine zipper-like motif (LZiP). We observed hyper-extended microtubules and the aberrant nuclear shape in Krp1p-overexpressed fission yeast. As a functional consequence, a point mutation of ATP-binding domain 1 (G89E) in Krp1p reversed the effect of Krp1p overexpression in fission yeast, whereas the specific mutation in ATP-binding domain 2 (G238E) resulted in the altered cell polarity. Additionally, truncation of the leucine zipper-like domain (LZiP) at the C-terminal of Krp1p showed a normal nuclear division. Taken together, we suggest that krp1p is involved in regulation of cell-polarized growth through ATP-binding motifs in fission yeast

  14. Getting from A to B-exploring the activation motifs of the class B adhesion G protein-coupled receptor subfamily G member 4/GPR112

    DEFF Research Database (Denmark)

    Cornelia Peeters, Miriam; Mos, Iris; Lenselink, Eelke B

    2016-01-01

    The adhesion G protein-coupled receptors (ADGRs/class B2 G protein-coupled receptors) constitute an ancient family of G protein-coupled receptors that have recently been demonstrated to play important roles in cellular and developmental processes. Here, we describe a first insight...... into the structure-function relationship of ADGRs using the family member ADGR subfamily G member 4 (ADGRG4)/GPR112 as a model receptor. In a bioinformatics approach, we compared conserved, functional elements of the well-characterized class A and class B1 secretin-like G protein-coupled receptors with the ADGRs. We...... identified several potential equivalent motifs and subjected those to mutational analysis. The importance of the mutated residues was evaluated by examining their effect on the high constitutive activity of the N-terminally truncated ADGRG4/GPR112 in a 1-receptor-1-G protein Saccharomyces cerevisiae...

  15. Structure and Sequence Search on Aptamer-Protein Docking

    Science.gov (United States)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  16. G protein-coupled estrogen receptor 1 (GPER1)/GPR30 increases ERK1/2 activity through PDZ motif-dependent and -independent mechanisms.

    Science.gov (United States)

    Gonzalez de Valdivia, Ernesto; Broselid, Stefan; Kahn, Robin; Olde, Björn; Leeb-Lundberg, L M Fredrik

    2017-06-16

    G protein-coupled receptor 30 (GPR30), also called G protein-coupled estrogen receptor 1 (GPER1), is thought to play important roles in breast cancer and cardiometabolic regulation, but many questions remain about ligand activation, effector coupling, and subcellular localization. We showed recently that GPR30 interacts through the C-terminal type I PDZ motif with SAP97 and protein kinase A (PKA)-anchoring protein (AKAP) 5, which anchor the receptor in the plasma membrane and mediate an apparently constitutive decrease in cAMP production independently of G i/o Here, we show that GPR30 also constitutively increases ERK1/2 activity. Removing the receptor PDZ motif or knocking down specifically AKAP5 inhibited the increase, showing that this increase also requires the PDZ interaction. However, the increase was inhibited by pertussis toxin as well as by wortmannin but not by AG1478, indicating that G i/o and phosphoinositide 3-kinase (PI3K) mediate the increase independently of epidermal growth factor receptor transactivation. FK506 and okadaic acid also inhibited the increase, implying that a protein phosphatase is involved. The proposed GPR30 agonist G-1 also increased ERK1/2 activity, but this increase was only observed at a level of receptor expression below that required for the constitutive increase. Furthermore, deleting the PDZ motif did not inhibit the G-1-stimulated increase. Based on these results, we propose that GPR30 increases ERK1/2 activity via two G i/o -mediated mechanisms, a PDZ-dependent, apparently constitutive mechanism and a PDZ-independent G-1-stimulated mechanism. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  17. Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System

    Directory of Open Access Journals (Sweden)

    Jinjian Jiang

    2017-07-01

    Full Text Available Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.

  18. BACE1 protein endocytosis and trafficking are differentially regulated by ubiquitination at lysine 501 and the Di-leucine motif in the carboxyl terminus.

    Science.gov (United States)

    Kang, Eugene L; Biscaro, Barbara; Piazza, Fabrizio; Tesco, Giuseppina

    2012-12-14

    β-Site amyloid precursor protein-cleaving enzyme (BACE1) is a membrane-tethered member of the aspartyl proteases that has been identified as β-secretase. BACE1 is targeted through the secretory pathway to the plasma membrane and then is internalized to endosomes. Sorting of membrane proteins to the endosomes and lysosomes is regulated by the interaction of signals present in their carboxyl-terminal fragment with specific trafficking molecules. The BACE1 carboxyl-terminal fragment contains a di-leucine sorting signal ((495)DDISLL(500)) and a ubiquitination site at Lys-501. Here, we report that lack of ubiquitination at Lys-501 (BACE1K501R) does not affect the rate of endocytosis but produces BACE1 stabilization and accumulation of BACE1 in early and late endosomes/lysosomes as well as at the cell membrane. In contrast, the disruption of the di-leucine motif (BACE1LLAA) greatly impairs BACE1 endocytosis and produces a delayed retrograde transport of BACE1 to the trans-Golgi network (TGN) and a delayed delivery of BACE1 to the lysosomes, thus decreasing its degradation. Moreover, the combination of the lack of ubiquitination at Lys-501 and the disruption of the di-leucine motif (BACE1LLAA/KR) produces additive effects on BACE1 stabilization and defective internalization. Finally, BACE1LLAA/KR accumulates in the TGN, while its levels are decreased in EEA1-positive compartments indicating that both ubiquitination at Lys-501 and the di-leucine motif are necessary for the trafficking of BACE1 from the TGN to early endosomes. Our studies have elucidated a differential role for the di-leucine motif and ubiquitination at Lys-501 in BACE1 endocytosis, trafficking, and degradation and suggest the involvement of multiple adaptor molecules.

  19. BACE1 Protein Endocytosis and Trafficking Are Differentially Regulated by Ubiquitination at Lysine 501 and the Di-leucine Motif in the Carboxyl Terminus*

    Science.gov (United States)

    Kang, Eugene L.; Biscaro, Barbara; Piazza, Fabrizio; Tesco, Giuseppina

    2012-01-01

    β-Site amyloid precursor protein-cleaving enzyme (BACE1) is a membrane-tethered member of the aspartyl proteases that has been identified as β-secretase. BACE1 is targeted through the secretory pathway to the plasma membrane and then is internalized to endosomes. Sorting of membrane proteins to the endosomes and lysosomes is regulated by the interaction of signals present in their carboxyl-terminal fragment with specific trafficking molecules. The BACE1 carboxyl-terminal fragment contains a di-leucine sorting signal (495DDISLL500) and a ubiquitination site at Lys-501. Here, we report that lack of ubiquitination at Lys-501 (BACE1K501R) does not affect the rate of endocytosis but produces BACE1 stabilization and accumulation of BACE1 in early and late endosomes/lysosomes as well as at the cell membrane. In contrast, the disruption of the di-leucine motif (BACE1LLAA) greatly impairs BACE1 endocytosis and produces a delayed retrograde transport of BACE1 to the trans-Golgi network (TGN) and a delayed delivery of BACE1 to the lysosomes, thus decreasing its degradation. Moreover, the combination of the lack of ubiquitination at Lys-501 and the disruption of the di-leucine motif (BACE1LLAA/KR) produces additive effects on BACE1 stabilization and defective internalization. Finally, BACE1LLAA/KR accumulates in the TGN, while its levels are decreased in EEA1-positive compartments indicating that both ubiquitination at Lys-501 and the di-leucine motif are necessary for the trafficking of BACE1 from the TGN to early endosomes. Our studies have elucidated a differential role for the di-leucine motif and ubiquitination at Lys-501 in BACE1 endocytosis, trafficking, and degradation and suggest the involvement of multiple adaptor molecules. PMID:23109336

  20. Regulatory motifs for CREB-binding protein and Nfe2l2 transcription factors in the upstream enhancer of the mitochondrial uncoupling protein 1 gene.

    Science.gov (United States)

    Rim, Jong S; Kozak, Leslie P

    2002-09-13

    Thermogenesis against cold exposure in mammals occurs in brown adipose tissue (BAT) through mitochondrial uncoupling protein (UCP1). Expression of the Ucp1 gene is unique in brown adipocytes and is regulated tightly. The 5'-flanking region of the mouse Ucp1 gene contains cis-acting elements including PPRE, TRE, and four half-site cAMP-responsive elements (CRE) with BAT-specific enhancer elements. In the course of analyzing how these half-site CREs are involved in Ucp1 expression, we found that a DNA regulatory element for NF-E2 overlaps CRE2. Electrophoretic mobility shift assay and competition assays with the CRE2 element indicates that nuclear proteins from BAT, inguinal fat, and retroperitoneal fat tissue interact with the CRE2 motif (CGTCA) in a specific manner. A supershift assay using an antibody against the CRE-binding protein (CREB) shows specific affinity to the complex from CRE2 and nuclear extract of BAT. Additionally, Western blot analysis for phospho-CREB/ATF1 shows an increase in phosphorylation of CREB/ATF1 in HIB-1B cells after norepinephrine treatment. Transient transfection assay using luciferase reporter constructs also indicates that the two half-site CREs are involved in transcriptional regulation of Ucp1 in response to norepinephrine and cAMP. We also show that a second DNA regulatory element for NF-E2 is located upstream of the CRE2 region. This element, which is found in a similar location in the 5'-flanking region of the human and rodent Ucp1 genes, shows specific binding to rat and human NF-E2 by electrophoretic mobility shift assay with nuclear extracts from brown fat. Co-transfections with an Nfe2l2 expression vector and a luciferase reporter construct of the Ucp1 enhancer region provide additional evidence that Nfe2l2 is involved in the regulation of Ucp1 by cAMP-mediated signaling.

  1. MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

    Science.gov (United States)

    Ozaki, Haruka; Iwasaki, Wataru

    2016-08-01

    As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Specific interaction of the nonstructural protein NS1 of minute virus of mice (MVM) with [ACCA](2) motifs in the centre of the right-end MVM DNA palindrome induces hairpin-primed viral DNA replication.

    Science.gov (United States)

    Willwand, Kurt; Moroianu, Adela; Hörlein, Rita; Stremmel, Wolfgang; Rommelaere, Jean

    2002-07-01

    The linear single-stranded DNA genome of minute virus of mice (MVM) is replicated via a double-stranded replicative form (RF) intermediate DNA. Amplification of viral RF DNA requires the structural transition of the right-end palindrome from a linear duplex into a double-hairpin structure, which serves for the repriming of unidirectional DNA synthesis. This conformational transition was found previously to be induced by the MVM nonstructural protein NS1. Elimination of the cognate NS1-binding sites, [ACCA](2), from the central region of the right-end palindrome next to the axis of symmetry was shown to markedly reduce the efficiency of hairpin-primed DNA replication, as measured in a reconstituted in vitro replication system. Thus, [ACCA](2) sequence motifs are essential as NS1-binding elements in the context of the structural transition of the right-end MVM palindrome.

  3. Osteocalcin protein sequences of Neanderthals and modern primates.

    Science.gov (United States)

    Nielsen-Marsh, Christina M; Richards, Michael P; Hauschka, Peter V; Thomas-Oates, Jane E; Trinkaus, Erik; Pettitt, Paul B; Karavanic, Ivor; Poinar, Hendrik; Collins, Matthew J

    2005-03-22

    We report here protein sequences of fossil hominids, from two Neanderthals dating to approximately 75,000 years old from Shanidar Cave in Iraq. These sequences, the oldest reported fossil primate protein sequences, are of bone osteocalcin, which was extracted and sequenced by using MALDI-TOF/TOF mass spectrometry. Through a combination of direct sequencing and peptide mass mapping, we determined that Neanderthals have an osteocalcin amino acid sequence that is identical to that of modern humans. We also report complete osteocalcin sequences for chimpanzee (Pan troglodytes) and gorilla (Gorilla gorilla gorilla) and a partial sequence for orangutan (Pongo pygmaeus), all of which are previously unreported. We found that the osteocalcin sequences of Neanderthals, modern human, chimpanzee, and orangutan are unusual among mammals in that the ninth amino acid is proline (Pro-9), whereas most species have hydroxyproline (Hyp-9). Posttranslational hydroxylation of Pro-9 in osteocalcin by prolyl-4-hydroxylase requires adequate concentrations of vitamin C (l-ascorbic acid), molecular O(2), Fe(2+), and 2-oxoglutarate, and also depends on enzyme recognition of the target proline substrate consensus sequence Leu-Gly-Ala-Pro-9-Ala-Pro-Tyr occurring in most mammals. In five species with Pro-9-Val-10, hydroxylation is blocked, whereas in gorilla there is a mixture of Pro-9 and Hyp-9. We suggest that the absence of hydroxylation of Pro-9 in Pan, Pongo, and Homo may reflect response to a selective pressure related to a decline in vitamin C in the diet during omnivorous dietary adaptation, either independently or through the common ancestor of these species.

  4. Experimental Rugged Fitness Landscape in Protein Sequence Space

    OpenAIRE

    HAYASHI, Yuuki; 相田, 拓洋; TOYOTA, Hitoshi; 伏見, 譲; URABE, Itaru; YOMO, Tetsuya

    2006-01-01

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phag...

  5. EST2Prot: Mapping EST sequences to proteins

    Directory of Open Access Journals (Sweden)

    Lin David M

    2006-03-01

    Full Text Available Abstract Background EST libraries are used in various biological studies, from microarray experiments to proteomic and genetic screens. These libraries usually contain many uncharacterized ESTs that are typically ignored since they cannot be mapped to known genes. Consequently, new discoveries are possibly overlooked. Results We describe a system (EST2Prot that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, Swiss-Prot keywords, and protein similarity data are used to map the ESTs to functional descriptors. Conclusion EST2Prot extends and significantly enriches the popular UniGene mapping by utilizing multiple relations between known biological entities. It produces a mapping between ESTs and proteins in real-time through a simple web-interface. The system is part of the Biozon database and is accessible at http://biozon.org/tools/est/.

  6. Convulxin, a C-type lectin-like protein, inhibits HCASMCs functions via WAD-motif/integrin-αv interaction and NF-κB-independent gene suppression of GRO and IL-8

    Energy Technology Data Exchange (ETDEWEB)

    Shih, Chun-Ho; Chiang, Tin-Bin [Chang Gung University of Science and Technology, Guishan Dist., Taoyuan City, Taiwan (China); Wang, Wen-Jeng, E-mail: wjwang@mail.cgust.edu.tw [Chang Gung University of Science and Technology, Guishan Dist., Taoyuan City, Taiwan (China); Department of Neurological Surgery, Chang Gung Memorial Hospital, Guishan Dist., Taoyuan City, Taiwan (China)

    2017-03-15

    Convulxin (CVX), a C-type lectin-like protein (CLPs), is a potent platelet aggregation inducer. To evaluate its potential applications in angiogenic diseases, the multimeric CVX were further explored on its mode of actions toward human coronary artery smooth muscle cells (HCASMCs). The N-terminus of β-chain of CVX (CVX-β) contains a putative disintegrin-like domain with a conserved motif upon the sequence comparison with other CLPs. Importantly, native CVX had no cytotoxic activity as examined by electrophoretic pattern. A Trp-Ala–Asp (WAD)-containing octapeptide, MTWADAEK, was thereafter synthesized and analyzed in functional assays. In the case of specific integrin antagonists as positive controls, the anti-angiogenic effects of CVX on HCASMCs were investigated by series of functional analyses. CVX showed to exhibit multiple inhibitory activities toward HCASMCs proliferation, adhesion and invasion with a dose- and integrin αvβ3-dependent fashion. However, the WAD-octapeptide exerting a minor potency could also work as an active peptidomimetic. In addition, flow cytometric analysis demonstrated both the intact CVX and synthetic peptide can specifically interact with integrin-αv on HCASMCs and CVX was shown to have a down-regulatory effect on the gene expression of CXC-chemokines, such as growth-related oncogene and interleukin-8. According to nuclear factor-κB (NF-κB) p65 translocation assay and Western blotting analysis, the NF-κB activation was not involved in the signaling events of CVX-induced gene expression. In conclusion, CVX may act as a disintegrin-like protein via the interactions of WAD-motif in CVX-β with integrin-αv on HCASMCs and it also is a gene suppressor with the ability to diminish the expression of two CXC-chemokines in a NF-κB-independent manner. Indeed, more extensive investigations are needed and might create a new avenue for the development of a novel angiostatic agent. - Highlights: • The tetrameric convulxin (CVX) with WAD-motif

  7. Convulxin, a C-type lectin-like protein, inhibits HCASMCs functions via WAD-motif/integrin-αv interaction and NF-κB-independent gene suppression of GRO and IL-8

    International Nuclear Information System (INIS)

    Shih, Chun-Ho; Chiang, Tin-Bin; Wang, Wen-Jeng

    2017-01-01

    Convulxin (CVX), a C-type lectin-like protein (CLPs), is a potent platelet aggregation inducer. To evaluate its potential applications in angiogenic diseases, the multimeric CVX were further explored on its mode of actions toward human coronary artery smooth muscle cells (HCASMCs). The N-terminus of β-chain of CVX (CVX-β) contains a putative disintegrin-like domain with a conserved motif upon the sequence comparison with other CLPs. Importantly, native CVX had no cytotoxic activity as examined by electrophoretic pattern. A Trp-Ala–Asp (WAD)-containing octapeptide, MTWADAEK, was thereafter synthesized and analyzed in functional assays. In the case of specific integrin antagonists as positive controls, the anti-angiogenic effects of CVX on HCASMCs were investigated by series of functional analyses. CVX showed to exhibit multiple inhibitory activities toward HCASMCs proliferation, adhesion and invasion with a dose- and integrin αvβ3-dependent fashion. However, the WAD-octapeptide exerting a minor potency could also work as an active peptidomimetic. In addition, flow cytometric analysis demonstrated both the intact CVX and synthetic peptide can specifically interact with integrin-αv on HCASMCs and CVX was shown to have a down-regulatory effect on the gene expression of CXC-chemokines, such as growth-related oncogene and interleukin-8. According to nuclear factor-κB (NF-κB) p65 translocation assay and Western blotting analysis, the NF-κB activation was not involved in the signaling events of CVX-induced gene expression. In conclusion, CVX may act as a disintegrin-like protein via the interactions of WAD-motif in CVX-β with integrin-αv on HCASMCs and it also is a gene suppressor with the ability to diminish the expression of two CXC-chemokines in a NF-κB-independent manner. Indeed, more extensive investigations are needed and might create a new avenue for the development of a novel angiostatic agent. - Highlights: • The tetrameric convulxin (CVX) with WAD-motif

  8. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    Directory of Open Access Journals (Sweden)

    Dobbs Drena

    2011-06-01

    Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of

  9. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

    Science.gov (United States)

    Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

    2016-10-07

    RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential

  10. Purification and functional motifs of the recombinant ATPase of orf virus.

    Science.gov (United States)

    Lin, Fong-Yuan; Chan, Kun-Wei; Wang, Chi-Young; Wong, Min-Liang; Hsu, Wei-Li

    2011-10-01

    Our previous study showed that the recombinant ATPase encoded by the A32L gene of orf virus displayed ATP hydrolysis activity as predicted from its amino acids sequence. This viral ATPase contains four known functional motifs (motifs I-IV) and a novel AYDG motif; they are essential for ATP hydrolysis reaction by binding ATP and magnesium ions. The motifs I and II correspond with the Walker A and B motifs of the typical ATPase, respectively. To examine the biochemical roles of these five conserved motifs, recombinant ATPases of five deletion mutants derived from the Taiping strain were expressed and purified. Their ATPase functions were assayed and compared with those of two wild type strains, Taiping and Nantou isolated in Taiwan. Our results showed that deletions at motifs I-III or IV exhibited lower activity than that of the wild type. Interestingly, deletion of AYDG motif decreased the ATPase activity more significantly than those of motifs I-IV deletions. Divalent ions such as magnesium and calcium were essential for ATPase activity. Moreover, our recombinant proteins of orf virus also demonstrated GTPase activity, though weaker than the original ATPase activity. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  12. Representation of protein-sequence information by amino acid subalphabets

    DEFF Research Database (Denmark)

    Andersen, C.A.F.; Brunak, Søren

    2004-01-01

    -sequence information, using machine learning strategies, where the primary goal is the discovery of novel powerful representations for use in AI techniques. In the case of proteins and the 20 different amino acids they typically contain, it is also a secondary goal to discover how the current selection of amino acids...

  13. Cloning and Sequencing of Protein Kinase cDNA from Harbor Seal (Phoca vitulina Lymphocytes

    Directory of Open Access Journals (Sweden)

    Jennifer C. C. Neale

    2004-01-01

    Full Text Available Protein kinases (PKs play critical roles in signal transduction and activation of lymphocytes. The identification of PK genes provides a tool for understanding mechanisms of immunotoxic xenobiotics. As part of a larger study investigating persistent organic pollutants in the harbor seal and their possible immunomodulatory actions, we sequenced harbor seal cDNA fragments encoding PKs. The procedure, using degenerate primers based on conserved motifs of human protein tyrosine kinases (PTKs, successfully amplified nine phocid PK gene fragments with high homology to human and rodent orthologs. We identified eight PTKs and one dual (serine/threonine and tyrosine kinase. Among these were several PKs important in early signaling events through the B- and T-cell receptors (FYN, LYN, ITK and SYK and a MAP kinase involved in downstream signal transduction. V-FGR, RET and DDR2 were also expressed. Sequential activation of protein kinases ultimately induces gene transcription leading to the proliferation and differentiation of lymphocytes critical to adaptive immunity. PKs are potential targets of bioactive xenobiotics, including persistent organic pollutants of the marine environment; characterization of these molecules in the harbor seal provides a foundation for further research illuminating mechanisms of action of contaminants speculated to contribute to large-scale die-offs of marine mammals via immunosuppression.

  14. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    Science.gov (United States)

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  15. GuiTope: an application for mapping random-sequence peptides to protein sequences

    Directory of Open Access Journals (Sweden)

    Halperin Rebecca F

    2012-01-01

    Full Text Available Abstract Background Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. Results GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. Conclusions GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  16. Induction of cell death by tospoviral protein NSs and the motif critical for cell death does not control RNA silencing suppression activity.

    Science.gov (United States)

    Singh, Ajeet; Permar, Vipin; Jain, R K; Goswami, Suneha; Kumar, Ranjeet Ranjan; Canto, Tomas; Palukaitis, Peter; Praveen, Shelly

    2017-08-01

    Groundnut bud necrosis virus induces necrotic symptoms in different hosts. Previous studies showed reactive oxygen species-mediated programmed cell death (PCD) resulted in necrotic symptoms. Transgenic expression of viral protein NSs mimics viral symptoms. Here, we showed a role for NSs in influencing oxidative burst in the cell, by analyzing H 2 O 2 accumulation, activities of antioxidant enzymes and expression levels of vacuolar processing enzymes, H 2 O 2 -responsive microRNA 319a.2 plus its possible target metacaspase-8. The role of NSs in PCD, was shown using two NSs mutants: one in the Trp/GH3 motif (a homologue of pro-apototic domain) (NSs S189R ) and the other in a non-Trp/GH3 motif (NSs L172R ). Tobacco rattle virus (TRV) expressing NSs S189R enhanced the PCD response, but not TRV-NSs L172R , while RNA silencing suppression activity was lost in TRV-NSs L172R , but not in TRV-NSs S189R . Therefore, we propose dual roles of NSs in RNA silencing suppression and induction of cell death, controlled by different motifs. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. The HMMER Web Server for Protein Sequence Similarity Search.

    Science.gov (United States)

    Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

    2017-12-08

    Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  18. Biophysical and structural considerations for protein sequence evolution

    Directory of Open Access Journals (Sweden)

    Grahnen Johan A

    2011-12-01

    Full Text Available Abstract Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model.

  19. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  20. Prunus necrotic ringspot ilarvirus: nucleotide sequence of RNA3 and the relationship to other ilarviruses based on coat protein comparison.

    Science.gov (United States)

    Guo, D; Maiss, E; Adam, G; Casper, R

    1995-05-01

    The RNA3 of prunus necrotic ringspot ilarvirus (PNRSV) has been cloned and its entire sequence determined. The RNA3 consists of 1943 nucleotides (nt) and possesses two large open reading frames (ORFs) separated by an intergenic region of 74 nt. The 5' proximal ORF is 855 nt in length and codes for a protein of molecular mass 31.4 kDa which has homologies with the putative movement protein of other members of the Bromoviridae. The 3' proximal ORF of 675 nt is the cistron for the coat protein (CP) and has a predicted molecular mass of 24.9 kDa. The sequence of the 3' non-coding region (NCR) of PNRSV RNA3 showed a high degree of similarity with those of tobacco streak virus (TSV), prune dwarf virus (PDV), apple mosaic virus (ApMV) and also alfalfa mosaic virus (AIMV). In addition it contained potential stem-loop structures with interspersed AUGC motifs characteristic for ilar- and alfamoviruses. This conserved primary and secondary structure in all 3' NCRs may be responsible for the interaction with homologous and heterologous CPs and subsequent activation of genome replication. The CP gene of an ApMV isolate (ApMV-G) of 657 nt has also been cloned and sequenced. Although ApMV and PNRSV have a distant serological relationship, the deduced amino acid sequences of their CPs have an identity of only 51.8%. The N termini of PNRSV and ApMV CPs have in common a zinc-finger motif and the potential to form an amphipathic helix.

  1. Disruption of Fyn SH3 domain interaction with a proline-rich motif in liver kinase B1 results in activation of AMP-activated protein kinase.

    Directory of Open Access Journals (Sweden)

    Eijiro Yamada

    Full Text Available Fyn-deficient mice display increased AMP-activated Protein Kinase (AMPK activity as a result of Fyn-dependent regulation of Liver Kinase B1 (LKB1 in skeletal muscle. Mutation of Fyn-specific tyrosine sites in LKB1 results in LKB1 export into the cytoplasm and increased AMPK activation site phosphorylation. This study characterizes the structural elements responsible for the physical interaction between Fyn and LKB1. Effects of point mutations in the Fyn SH2/SH3 domains and in the LKB1 proline-rich motif on 1 Fyn and LKB1 binding, 2 LKB1 subcellular localization and 3 AMPK phosphorylation were investigated in C2C12 muscle cells. Additionally, novel LKB1 proline-rich motif mimicking cell permeable peptides were generated to disrupt Fyn/LKB1 binding and investigate the consequences on AMPK activity in both C2C12 cells and mouse skeletal muscle. Mutation of either Fyn SH3 domain or the proline-rich motif of LKB1 resulted in the disruption of Fyn/LKB1 binding, re-localization of 70% of LKB1 signal in the cytoplasm and a 2-fold increase in AMPK phosphorylation. In vivo disruption of the Fyn/LKB1 interaction using LKB1 proline-rich motif mimicking cell permeable peptides recapitulated Fyn pharmacological inhibition. We have pinpointed the structural elements within Fyn and LKB1 that are responsible for their binding, demonstrating the functionality of this interaction in regulating AMPK activity.

  2. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2010-09-01

    Full Text Available Abstract Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS" but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq not to be biological transcription factor binding sites ("empirical TFBS". We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation.

  3. Protein sequencing via nanopore based devices: a nanofluidics perspective

    Science.gov (United States)

    Chinappi, Mauro; Cecconi, Fabio

    2018-05-01

    Proteins perform a huge number of central functions in living organisms, thus all the new techniques allowing their precise, fast and accurate characterization at single-molecule level certainly represent a burst in proteomics with important biomedical impact. In this review, we describe the recent progresses in the developing of nanopore based devices for protein sequencing. We start with a critical analysis of the main technical requirements for nanopore protein sequencing, summarizing some ideas and methodologies that have recently appeared in the literature. In the last sections, we focus on the physical modelling of the transport phenomena occurring in nanopore based devices. The multiscale nature of the problem is discussed and, in this respect, some of the main possible computational approaches are illustrated.

  4. Sequence heterogeneity accelerates protein search for targets on DNA

    International Nuclear Information System (INIS)

    Shvets, Alexey A.; Kolomeisky, Anatoly B.

    2015-01-01

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome

  5. Sequence heterogeneity accelerates protein search for targets on DNA

    Energy Technology Data Exchange (ETDEWEB)

    Shvets, Alexey A.; Kolomeisky, Anatoly B., E-mail: tolya@rice.edu [Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005 (United States)

    2015-12-28

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome.

  6. Determining and comparing protein function in Bacterial genome sequences

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla

    of this class have very little homology to other known genomes making functional annotation based on sequence similarity very difficult. Inspired in part by this analysis, an approach for comparative functional annotation was created based public sequenced genomes, CMGfunc. Functionally related groups......In November 2013, there was around 21.000 different prokaryotic genomes sequenced and publicly available, and the number is growing daily with another 20.000 or more genomes expected to be sequenced and deposited by the end of 2014. An important part of the analysis of this data is the functional...... annotation of genes – the descriptions assigned to genes that describe the likely function of the encoded proteins. This process is limited by several factors, including the definition of a function which can be more or less specific as well as how many genes can actually be assigned a function based...

  7. Relationships between residue Voronoi volume and sequence conservation in proteins.

    Science.gov (United States)

    Liu, Jen-Wei; Cheng, Chih-Wen; Lin, Yu-Feng; Chen, Shao-Yu; Hwang, Jenn-Kang; Yen, Shih-Chung

    2018-02-01

    Functional and biophysical constraints can cause different levels of sequence conservation in proteins. Previously, structural properties, e.g., relative solvent accessibility (RSA) and packing density of the weighted contact number (WCN), have been found to be related to protein sequence conservation (CS). The Voronoi volume has recently been recognized as a new structural property of the local protein structural environment reflecting CS. However, for surface residues, it is sensitive to water molecules surrounding the protein structure. Herein, we present a simple structural determinant termed the relative space of Voronoi volume (RSV); it uses the Voronoi volume and the van der Waals volume of particular residues to quantify the local structural environment. RSV (range, 0-1) is defined as (Voronoi volume-van der Waals volume)/Voronoi volume of the target residue. The concept of RSV describes the extent of available space for every protein residue. RSV and Voronoi profiles with and without water molecules (RSVw, RSV, VOw, and VO) were compared for 554 non-homologous proteins. RSV (without water) showed better Pearson's correlations with CS than did RSVw, VO, or VOw values. The mean correlation coefficient between RSV and CS was 0.51, which is comparable to the correlation between RSA and CS (0.49) and that between WCN and CS (0.56). RSV is a robust structural descriptor with and without water molecules and can quantitatively reflect evolutionary information in a single protein structure. Therefore, it may represent a practical structural determinant to study protein sequence, structure, and function relationships. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Karyological characterization and identification of four repetitive element groups (the 18S – 28S rRNA gene, telomeric sequences, microsatellite repeat motifs, Rex retroelements) of the Asian swamp eel (Monopterus albus)

    Science.gov (United States)

    Suntronpong, Aorarat; Thapana, Watcharaporn; Twilprawat, Panupon; Prakhongcheep, Ornjira; Somyong, Suthasinee; Muangmai, Narongrit; Surin Peyachoknagul; Srikulnath, Kornsorn

    2017-01-01

    Abstract Among teleost fishes, Asian swamp eel (Monopterus albus Zuiew, 1793) possesses the lowest chromosome number, 2n = 24. To characterize the chromosome constitution and investigate the genome organization of repetitive sequences in M. albus, karyotyping and chromosome mapping were performed with the 18S – 28S rRNA gene, telomeric repeats, microsatellite repeat motifs, and Rex retroelements. The 18S – 28S rRNA genes were observed to the pericentromeric region of chromosome 4 at the same position with large propidium iodide and C-positive bands, suggesting that the molecular structure of the pericentromeric regions of chromosome 4 has evolved in a concerted manner with amplification of the 18S – 28S rRNA genes. (TTAGGG)n sequences were found at the telomeric ends of all chromosomes. Eight of 19 microsatellite repeat motifs were dispersedly mapped on different chromosomes suggesting the independent amplification of microsatellite repeat motifs in M. albus. Monopterus albus Rex1 (MALRex1) was observed at interstitial sites of all chromosomes and in the pericentromeric regions of most chromosomes whereas MALRex3 was scattered and localized to all chromosomes and MALRex6 to several chromosomes. This suggests that these retroelements were independently amplified or lost in M. albus. Among MALRexs (MALRex1, MALRex3, and MALRex6), MALRex6 showed higher interspecific sequence divergences from other teleost species in comparison. This suggests that the divergence of Rex6 sequences of M. albus might have occurred a relatively long time ago. PMID:29093797

  9. Ultra-fast evaluation of protein energies directly from sequence.

    Directory of Open Access Journals (Sweden)

    Gevorg Grigoryan

    2006-06-01

    Full Text Available The structure, function, stability, and many other properties of a protein in a fixed environment are fully specified by its sequence, but in a manner that is difficult to discern. We present a general approach for rapidly mapping sequences directly to their energies on a pre-specified rigid backbone, an important sub-problem in computational protein design and in some methods for protein structure prediction. The cluster expansion (CE method that we employ can, in principle, be extended to model any computable or measurable protein property directly as a function of sequence. Here we show how CE can be applied to the problem of computational protein design, and use it to derive excellent approximations of physical potentials. The approach provides several attractive advantages. First, following a one-time derivation of a CE expansion, the amount of time necessary to evaluate the energy of a sequence adopting a specified backbone conformation is reduced by a factor of 10(7 compared to standard full-atom methods for the same task. Second, the agreement between two full-atom methods that we tested and their CE sequence-based expressions is very high (root mean square deviation 1.1-4.7 kcal/mol, R2 = 0.7-1.0. Third, the functional form of the CE energy expression is such that individual terms of the expansion have clear physical interpretations. We derived expressions for the energies of three classic protein design targets-a coiled coil, a zinc finger, and a WW domain-as functions of sequence, and examined the most significant terms. Single-residue and residue-pair interactions are sufficient to accurately capture the energetics of the dimeric coiled coil, whereas higher-order contributions are important for the two more globular folds. For the task of designing novel zinc-finger sequences, a CE-derived energy function provides significantly better solutions than a standard design protocol, in comparable computation time. Given these advantages

  10. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    Science.gov (United States)

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  11. The conserved basic residues and the charged amino acid residues at the α-helix of the zinc finger motif regulate the nuclear transport activity of triple C2H2 zinc finger proteins

    Science.gov (United States)

    Lin, Chih-Ying

    2018-01-01

    Zinc finger (ZF) motifs on proteins are frequently recognized as a structure for DNA binding. Accumulated reports indicate that ZF motifs contain nuclear localization signal (NLS) to facilitate the transport of ZF proteins into nucleus. We investigated the critical factors that facilitate the nuclear transport of triple C2H2 ZF proteins. Three conserved basic residues (hot spots) were identified among the ZF sequences of triple C2H2 ZF proteins that reportedly have NLS function. Additional basic residues can be found on the α-helix of the ZFs. Using the ZF domain (ZFD) of Egr-1 as a template, various mutants were constructed and expressed in cells. The nuclear transport activity of various mutants was estimated by analyzing the proportion of protein localized in the nucleus. Mutation at any hot spot of the Egr-1 ZFs reduced the nuclear transport activity. Changes of the basic residues at the α-helical region of the second ZF (ZF2) of the Egr-1 ZFD abolished the NLS activity. However, this activity can be restored by substituting the acidic residues at the homologous positions of ZF1 or ZF3 with basic residues. The restored activity dropped again when the hot spots at ZF1 or the basic residues in the α-helix of ZF3 were mutated. The variations in nuclear transport activity are linked directly to the binding activity of the ZF proteins with importins. This study was extended to other triple C2H2 ZF proteins. SP1 and KLF families, similar to Egr-1, have charged amino acid residues at the second (α2) and the third (α3) positions of the α-helix. Replacing the amino acids at α2 and α3 with acidic residues reduced the NLS activity of the SP1 and KLF6 ZFD. The reduced activity can be restored by substituting the α3 with histidine at any SP1 and KLF6 ZFD. The results show again the interchangeable role of ZFs and charge residues in the α-helix in regulating the NLS activity of triple C2H2 ZF proteins. PMID:29381770

  12. Cofactor-binding sites in proteins of deviating sequence: comparative analysis and clustering in torsion angle, cavity, and fold space.

    Science.gov (United States)

    Stegemann, Björn; Klebe, Gerhard

    2012-02-01

    Small molecules are recognized in protein-binding pockets through surface-exposed physicochemical properties. To optimize binding, they have to adopt a conformation corresponding to a local energy minimum within the formed protein-ligand complex. However, their conformational flexibility makes them competent to bind not only to homologous proteins of the same family but also to proteins of remote similarity with respect to the shape of the binding pockets and folding pattern. Considering drug action, such observations can give rise to unexpected and undesired cross reactivity. In this study, datasets of six different cofactors (ADP, ATP, NAD(P)(H), FAD, and acetyl CoA, sharing an adenosine diphosphate moiety as common substructure), observed in multiple crystal structures of protein-cofactor complexes exhibiting sequence identity below 25%, have been analyzed for the conformational properties of the bound ligands, the distribution of physicochemical properties in the accommodating protein-binding pockets, and the local folding patterns next to the cofactor-binding site. State-of-the-art clustering techniques have been applied to group the different protein-cofactor complexes in the different spaces. Interestingly, clustering in cavity (Cavbase) and fold space (DALI) reveals virtually the same data structuring. Remarkable relationships can be found among the different spaces. They provide information on how conformations are conserved across the host proteins and which distinct local cavity and fold motifs recognize the different portions of the cofactors. In those cases, where different cofactors are found to be accommodated in a similar fashion to the same fold motifs, only a commonly shared substructure of the cofactors is used for the recognition process. Copyright © 2011 Wiley Periodicals, Inc.

  13. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

    Directory of Open Access Journals (Sweden)

    Colin A Smith

    Full Text Available Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface, interactions between and within parts of the structure (e.g. domains can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

  14. SAAS: Short Amino Acid Sequence - A Promising Protein Secondary Structure Prediction Method of Single Sequence

    Directory of Open Access Journals (Sweden)

    Zhou Yuan Wu

    2013-07-01

    Full Text Available In statistical methods of predicting protein secondary structure, many researchers focus on single amino acid frequencies in α-helices, β-sheets, and so on, or the impact near amino acids on an amino acid forming a secondary structure. But the paper considers a short sequence of amino acids (3, 4, 5 or 6 amino acids as integer, and statistics short sequence's probability forming secondary structure. Also, many researchers select low homologous sequences as statistical database. But this paper select whole PDB database. In this paper we propose a strategy to predict protein secondary structure using simple statistical method. Numerical computation shows that, short amino acids sequence as integer to statistics, which can easy see trend of short sequence forming secondary structure, and it will work well to select large statistical database (whole PDB database without considering homologous, and Q3 accuracy is ca. 74% using this paper proposed simple statistical method, but accuracy of others statistical methods is less than 70%.

  15. Protein sequences bound to mineral surfaces persist into deep time

    DEFF Research Database (Denmark)

    Demarchi, Beatrice; Hall, Shaun; Roncal-Herrero, Teresa

    2016-01-01

    of Laetoli (3.8 Ma) and Olduvai Gorge (1.3 Ma) in Tanzania. By tracking protein diagenesis back in time we find consistent patterns of preservation, demonstrating authenticity of the surviving sequences. Molecular dynamics simulations of struthiocalcin-1 and -2, the dominant proteins within the eggshell......, reveal that distinct domains bind to the mineral surface. It is the domain with the strongest calculated binding energy to the calcite surface that is selectively preserved. Thermal age calculations demonstrate that the Laetoli and Olduvai peptides are 50 times older than any previously authenticated...

  16. The position of the Gly-xxx-Gly motif in transmembrane segments modulates dimer affinity.

    Science.gov (United States)

    Johnson, Rachel M; Rath, Arianna; Deber, Charles M

    2006-12-01

    Although the intrinsic low solubility of membrane proteins presents challenges to their high-resolution structure determination, insight into the amino acid sequence features and forces that stabilize their folds has been provided through study of sequence-dependent helix-helix interactions between single transmembrane (TM) helices. While the stability of helix-helix partnerships mediated by the Gly-xxx-Gly (GG4) motif is known to be generally modulated by distal interfacial residues, it has not been established whether the position of this motif, with respect to the ends of a given TM segment, affects dimer affinity. Here we examine the relationship between motif position and affinity in the homodimers of 2 single-spanning membrane protein TM sequences: glycophorin A (GpA) and bacteriophage M13 coat protein (MCP). Using the TOXCAT assay for dimer affinity on a series of GpA and MCP TM segments that have been modified with either 4 Leu residues at each end or with 8 Leu residues at the N-terminal end, we show that in each protein, centrally located GG4 motifs are capable of stronger helix-helix interactions than those proximal to TM helix ends, even when surrounding interfacial residues are maintained. The relative importance of GG4 motifs in stabilizing helix-helix interactions therefore must be considered not only in its specific residue context but also in terms of the location of the interactive surface relative to the N and C termini of alpha-helical TM segments.

  17. A mutation in the glutamate-rich region of RNA-binding motif protein 20 causes dilated cardiomyopathy through missplicing of titin and impaired Frank-Starling mechanism

    DEFF Research Database (Denmark)

    Beqqali, Abdelaziz; Bollen, I. A. E.; Rasmussen, T. B.

    2016-01-01

    Mutations in the RS-domain of RNA-binding motif protein 20 (RBM20) have recently been identified to segregate with aggressive forms of familial dilated cardiomyopathy (DCM). Loss of RBM20 in rats results in missplicing of the sarcomeric gene titin (TTN). The functional and physiological consequen......Mutations in the RS-domain of RNA-binding motif protein 20 (RBM20) have recently been identified to segregate with aggressive forms of familial dilated cardiomyopathy (DCM). Loss of RBM20 in rats results in missplicing of the sarcomeric gene titin (TTN). The functional and physiological...... consequences of RBM20 mutations outside the mutational hotspot of RBM20 have not been explored to date. In this study, we investigated the pathomechanism of DCM caused by a novel RBM20 mutation in human cardiomyocytes. We identified a family with DCM carrying a mutation (RBM20(E913K/+)) in a glutamate...... to the early onset, and malignant course of DCM caused by RBM20 mutations. Altogether, our results demonstrate that heterozygous loss of RBM20 suffices to profoundly impair myocyte biomechanics by its disturbance of TTN splicing....

  18. The group B streptococcal alpha C protein binds alpha1beta1-integrin through a novel KTD motif that promotes internalization of GBS within human epithelial cells.

    Science.gov (United States)

    Bolduc, Gilles R; Madoff, Lawrence C

    2007-12-01

    Group B Streptococcus (GBS) is the leading cause of bacterial pneumonia, sepsis and meningitis among neonates and a cause of morbidity among pregnant women and immunocompromised adults. GBS epithelial cell invasion is associated with expression of alpha C protein (ACP). Loss of ACP expression results in a decrease in GBS internalization and translocation across human cervical epithelial cells (ME180). Soluble ACP and its 170 amino acid N-terminal region (NtACP), but not the repeat protein RR', bind to ME180 cells and reduce internalization of wild-type GBS to levels obtained with an ACP-deficient isogenic mutant. In the current study, ACP colocalized with alpha(1)beta(1)-integrin, resulting in integrin clustering as determined by laser scanning confocal microscopy. NtACP contains two structural domains, D1 and D2. D1 is structurally similar to fibronectin's integrin-binding region (FnIII10). D1's (KT)D146 motif is structurally similar to the FnIII10 (RG)D1495 integrin-binding motif, suggesting that ACP binds alpha(1)beta(1)-integrin via the D1 domain. The (KT)D146A mutation within soluble NtACP reduced its ability to bind alpha(1)beta(1)-integrin and inhibit GBS internalization within ME180 cells. Thus ACP binding to human epithelial cell integrins appears to contribute to GBS internalization within epithelial cells.

  19. Computational identification of MoRFs in protein sequences.

    Science.gov (United States)

    Malhis, Nawar; Gsponer, Jörg

    2015-06-01

    Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is the binding of molecular recognition features (MoRFs) to globular protein domains in a process known as a disorder-to-order transition. Predicting the location of MoRFs in protein sequences with high accuracy remains an important computational challenge. In this study, we introduce MoRFCHiBi, a new computational approach for fast and accurate prediction of MoRFs in protein sequences. MoRFCHiBi combines the outcomes of two support vector machine (SVM) models that take advantage of two different kernels with high noise tolerance. The first, SVMS, is designed to extract maximal information from the general contrast in amino acid compositions between MoRFs, their surrounding regions (Flanks), and the remainders of the sequences. The second, SVMT, is used to identify similarities between regions in a query sequence and MoRFs of the training set. We evaluated the performance of our predictor by comparing its results with those of two currently available MoRF predictors, MoRFpred and ANCHOR. Using three test sets that have previously been collected and used to evaluate MoRFpred and ANCHOR, we demonstrate that MoRFCHiBi outperforms the other predictors with respect to different evaluation metrics. In addition, MoRFCHiBi is downloadable and fast, which makes it useful as a component in other computational prediction tools. http://www.chibi.ubc.ca/morf/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. Structural and Sequence Similarities of Hydra Xeroderma Pigmentosum A Protein to Human Homolog Suggest Early Evolution and Conservation

    Directory of Open Access Journals (Sweden)

    Apurva Barve

    2013-01-01

    Full Text Available Xeroderma pigmentosum group A (XPA is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1 and replication protein A 70 kDa subunit (RPA70 proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla.

  1. Functional analysis of bipartite begomovirus coat protein promoter sequences

    International Nuclear Information System (INIS)

    Lacatus, Gabriela; Sunter, Garry

    2008-01-01

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters

  2. Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

    Science.gov (United States)

    Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

    2001-08-15

    This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

  3. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.

    Science.gov (United States)

    Hayat, Maqsood; Khan, Asifullah

    2011-02-21

    Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.

  4. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    Science.gov (United States)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  5. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Directory of Open Access Journals (Sweden)

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  6. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

    Directory of Open Access Journals (Sweden)

    Mile Sikić

    2009-01-01

    Full Text Available Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i a combination of sequence- and structure-derived parameters and (ii sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.

  7. Extensive Mutagenesis of the Conserved Box E Motif in Duck Hepatitis B Virus P Protein Reveals Multiple Functions in Replication and a Common Structure with the Primer Grip in HIV-1 Reverse Transcriptase

    OpenAIRE

    Wang, Yong-Xiang; Luo, Cheng; Zhao, Dan; Beck, Jürgen; Nassal, Michael

    2012-01-01

    Hepadnaviruses, including the pathogenic hepatitis B virus (HBV), replicate their small DNA genomes through protein-primed reverse transcription, mediated by the terminal protein (TP) domain in their P proteins and an RNA stem-loop, ϵ, on the pregenomic RNA (pgRNA). No direct structural data are available for P proteins, but their reverse transcriptase (RT) domains contain motifs that are conserved in all RTs (box A to box G), implying a similar architecture; however, experimental support for...

  8. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence.

    Science.gov (United States)

    Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin

    2011-08-21

    Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Analysis of correlations between sites in models of protein sequences

    International Nuclear Information System (INIS)

    Giraud, B.G.; Lapedes, A.; Liu, L.C.

    1998-01-01

    A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences; however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from 2, illustrating the relation to familiar classical spin systems, to 20 states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces nonindependence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: The more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor. copyright 1998 The American Physical Society

  10. Secretory TAT-peptide-mediated protein transduction of LIF receptor α-chain distal cytoplasmic motifs into human myeloid HL-60 cells

    Directory of Open Access Journals (Sweden)

    Q. Sun

    2012-10-01

    Full Text Available The distal cytoplasmic motifs of leukemia inhibitory factor receptor α-chain (LIFRα-CT3 can independently induce intracellular myeloid differentiation in acute myeloid leukemia (AML cells by gene transfection; however, there are significant limitations in the potential clinical use of these motifs due to liposome-derived genetic modifications. To produce a potentially therapeutic LIFRα-CT3 with cell-permeable activity, we constructed a eukaryotic expression pcDNA3.0-TAT-CT3-cMyc plasmid with a signal peptide (ss inserted into the N-terminal that codes for an ss-TAT-CT3-cMyc fusion protein. The stable transfection of Chinese hamster ovary (CHO cells via this vector and subsequent selection by Geneticin resulted in cell lines that express and secrete TAT-CT3-cMyc. The spent medium of pcDNA3.0-TAT-CT3-cMyc-transfected CHO cells could be purified using a cMyc-epitope-tag agarose affinity chromatography column and could be detected via SDS-PAGE, with antibodies against cMyc-tag. The direct administration of TAT-CT3-cMyc to HL-60 cell culture media caused the enrichment of CT3-cMyc in the cytoplasm and nucleus within 30 min and led to a significant reduction of viable cells (P < 0.05 8 h after exposure. The advantages of using this mammalian expression system include the ease of generating TAT fusion proteins that are adequately transcripted and the potential for a sustained production of such proteins in vitro for future AML therapy.

  11. Secretory TAT-peptide-mediated protein transduction of LIF receptor α-chain distal cytoplasmic motifs into human myeloid HL-60 cells

    International Nuclear Information System (INIS)

    Sun, Q.; Xiong, J.; Lu, J.; Xu, S.; Li, Y.; Zhong, X.P.; Gao, G.K.; Liu, H.Q.

    2012-01-01

    The distal cytoplasmic motifs of leukemia inhibitory factor receptor α-chain (LIFRα-CT3) can independently induce intracellular myeloid differentiation in acute myeloid leukemia (AML) cells by gene transfection; however, there are significant limitations in the potential clinical use of these motifs due to liposome-derived genetic modifications. To produce a potentially therapeutic LIFRα-CT3 with cell-permeable activity, we constructed a eukaryotic expression pcDNA3.0-TAT-CT3-cMyc plasmid with a signal peptide (ss) inserted into the N-terminal that codes for an ss-TAT-CT3-cMyc fusion protein. The stable transfection of Chinese hamster ovary (CHO) cells via this vector and subsequent selection by Geneticin resulted in cell lines that express and secrete TAT-CT3-cMyc. The spent medium of pcDNA3.0-TAT-CT3-cMyc-transfected CHO cells could be purified using a cMyc-epitope-tag agarose affinity chromatography column and could be detected via SDS-PAGE, with antibodies against cMyc-tag. The direct administration of TAT-CT3-cMyc to HL-60 cell culture media caused the enrichment of CT3-cMyc in the cytoplasm and nucleus within 30 min and led to a significant reduction of viable cells (P < 0.05) 8 h after exposure. The advantages of using this mammalian expression system include the ease of generating TAT fusion proteins that are adequately transcripted and the potential for a sustained production of such proteins in vitro for future AML therapy

  12. Secretory TAT-peptide-mediated protein transduction of LIF receptor α-chain distal cytoplasmic motifs into human myeloid HL-60 cells

    Energy Technology Data Exchange (ETDEWEB)

    Sun, Q. [Department of Hyperbaric Medicine, No. 401 Hospital of PLA, Qingdao (China); Department of Histology and Embryology, Faculty of Basic Medical Sciences, Second Military Medical University, Shanghai (China); Xiong, J. [Department of Histology and Embryology, Faculty of Basic Medical Sciences, Second Military Medical University, Shanghai (China); Lu, J. [Office of Medical Education, Training Department, Second Military Medical University, Shanghai (China); Xu, S. [Department of Histology and Embryology, Faculty of Basic Medical Sciences, Second Military Medical University, Shanghai (China); Li, Y. [State Food and Drug Administration of China,Huangdao Branch, Qingdao (China); Zhong, X.P.; Gao, G.K. [Department of Hyperbaric Medicine, No. 401 Hospital of PLA, Qingdao (China); Liu, H.Q. [2Department of Histology and Embryology, Faculty of Basic Medical Sciences, Second Military Medical University, Shanghai (China)

    2012-06-22

    The distal cytoplasmic motifs of leukemia inhibitory factor receptor α-chain (LIFRα-CT3) can independently induce intracellular myeloid differentiation in acute myeloid leukemia (AML) cells by gene transfection; however, there are significant limitations in the potential clinical use of these motifs due to liposome-derived genetic modifications. To produce a potentially therapeutic LIFRα-CT3 with cell-permeable activity, we constructed a eukaryotic expression pcDNA3.0-TAT-CT3-cMyc plasmid with a signal peptide (ss) inserted into the N-terminal that codes for an ss-TAT-CT3-cMyc fusion protein. The stable transfection of Chinese hamster ovary (CHO) cells via this vector and subsequent selection by Geneticin resulted in cell lines that express and secrete TAT-CT3-cMyc. The spent medium of pcDNA3.0-TAT-CT3-cMyc-transfected CHO cells could be purified using a cMyc-epitope-tag agarose affinity chromatography column and could be detected via SDS-PAGE, with antibodies against cMyc-tag. The direct administration of TAT-CT3-cMyc to HL-60 cell culture media caused the enrichment of CT3-cMyc in the cytoplasm and nucleus within 30 min and led to a significant reduction of viable cells (P < 0.05) 8 h after exposure. The advantages of using this mammalian expression system include the ease of generating TAT fusion proteins that are adequately transcripted and the potential for a sustained production of such proteins in vitro for future AML therapy.

  13. Protein model discrimination using mutational sensitivity derived from deep sequencing.

    Science.gov (United States)

    Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan

    2012-02-08

    A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs

    OpenAIRE

    Chang, Tzu-Hao; Huang, Hsi-Yuan; Hsu, Justin Bo-Kai; Weng, Shun-Long; Horng, Jorng-Tzong; Huang, Hsien-Da

    2013-01-01

    Background Functional RNA molecules participate in numerous biological processes, ranging from gene regulation to protein synthesis. Analysis of functional RNA motifs and elements in RNA sequences can obtain useful information for deciphering RNA regulatory mechanisms. Our previous work, RegRNA, is widely used in the identification of regulatory motifs, and this work extends it by incorporating more comprehensive and updated data sources and analytical approaches into a new platform. Methods ...

  15. Crystal structure of the G3BP2 NTF2-like domain in complex with a canonical FGDF motif peptide

    DEFF Research Database (Denmark)

    Kristensen, Ole

    2015-01-01

    -terminal domains of the G3BP1 and Rasputin proteins. Recently, a subset of G3BP interacting proteins was recognized to share a common sequence motif, FGDF. The most studied binding partners, USP10 and viral nsP3, interfere with essential G3BP functions related to assembly of cellular stress granules. Reported...

  16. Nucleation phenomena in protein folding: the modulating role of protein sequence

    International Nuclear Information System (INIS)

    Travasso, Rui D M; FaIsca, Patricia F N; Gama, Margarida M Telo da

    2007-01-01

    For the vast majority of naturally occurring, small, single-domain proteins, folding is often described as a two-state process that lacks detectable intermediates. This observation has often been rationalized on the basis of a nucleation mechanism for protein folding whose basic premise is the idea that, after completion of a specific set of contacts forming the so-called folding nucleus, the native state is achieved promptly. Here we propose a methodology to identify folding nuclei in small lattice polymers and apply it to the study of protein molecules with a chain length of N = 48. To investigate the extent to which protein topology is a robust determinant of the nucleation mechanism, we compare the nucleation scenario of a native-centric model with that of a sequence-specific model sharing the same native fold. To evaluate the impact of the sequence's finer details in the nucleation mechanism, we consider the folding of two non-homologous sequences. We conclude that, in a sequence-specific model, the folding nucleus is, to some extent, formed by the most stable contacts in the protein and that the less stable linkages in the folding nucleus are solely determined by the fold's topology. We have also found that, independently of the protein sequence, the folding nucleus performs the same 'topological' function. This unifying feature of the nucleation mechanism results from the residues forming the folding nucleus being distributed along the protein chain in a similar and well-defined manner that is determined by the fold's topological features

  17. Identification and analysis of Eimeria nieschulzi gametocyte genes reveal splicing events of gam genes and conserved motifs in the wall-forming proteins within the genus Eimeria (Coccidia, Apicomplexa

    Directory of Open Access Journals (Sweden)

    Wiedmer Stefanie

    2017-01-01

    Full Text Available The genus Eimeria (Apicomplexa, Coccidia provides a wide range of different species with different hosts to study common and variable features within the genus and its species. A common characteristic of all known Eimeria species is the oocyst, the infectious stage where its life cycle starts and ends. In our study, we utilized Eimeria nieschulzi as a model organism. This rat-specific parasite has complex oocyst morphology and can be transfected and even cultivated in vitro up to the oocyst stage. We wanted to elucidate how the known oocyst wall-forming proteins are preserved in this rodent Eimeria species compared to other Eimeria. In newly obtained genomics data, we were able to identify different gametocyte genes that are orthologous to already known gam genes involved in the oocyst wall formation of avian Eimeria species. These genes appeared putatively as single exon genes, but cDNA analysis showed alternative splicing events in the transcripts. The analysis of the translated sequence revealed different conserved motifs but also dissimilar regions in GAM proteins, as well as polymorphic regions. The occurrence of an underrepresented gam56 gene version suggests the existence of a second distinct E. nieschulzi genotype within the E. nieschulzi Landers isolate that we maintain.

  18. Identification and analysis of Eimeria nieschulzi gametocyte genes reveal splicing events of gam genes and conserved motifs in the wall-forming proteins within the genus Eimeria (Coccidia, Apicomplexa)

    Science.gov (United States)

    Wiedmer, Stefanie; Erdbeer, Alexander; Volke, Beate; Randel, Stephanie; Kapplusch, Franz; Hanig, Sacha; Kurth, Michael

    2017-01-01

    The genus Eimeria (Apicomplexa, Coccidia) provides a wide range of different species with different hosts to study common and variable features within the genus and its species. A common characteristic of all known Eimeria species is the oocyst, the infectious stage where its life cycle starts and ends. In our study, we utilized Eimeria nieschulzi as a model organism. This rat-specific parasite has complex oocyst morphology and can be transfected and even cultivated in vitro up to the oocyst stage. We wanted to elucidate how the known oocyst wall-forming proteins are preserved in this rodent Eimeria species compared to other Eimeria. In newly obtained genomics data, we were able to identify different gametocyte genes that are orthologous to already known gam genes involved in the oocyst wall formation of avian Eimeria species. These genes appeared putatively as single exon genes, but cDNA analysis showed alternative splicing events in the transcripts. The analysis of the translated sequence revealed different conserved motifs but also dissimilar regions in GAM proteins, as well as polymorphic regions. The occurrence of an underrepresented gam56 gene version suggests the existence of a second distinct E. nieschulzi genotype within the E. nieschulzi Landers isolate that we maintain. PMID:29210668

  19. Temporal motifs in time-dependent networks

    International Nuclear Information System (INIS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-01-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological–temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network

  20. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    Science.gov (United States)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  1. Sequence walkers: a graphical method to display how binding proteins interact with DNA or RNA sequences | Center for Cancer Research

    Science.gov (United States)

    A graphical method is presented for displaying how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or upside-down and placed below the line indicating unfavorable contact. The positive or negative height of each letter shows the contribution of that base to the average sequence conservation of the binding site, as represented by a sequence logo.

  2. Two Tetrahymena G-DNA-binding proteins, TGP1 and TGP3, share novel motifs and may play a role in micronuclear division

    OpenAIRE

    Lu, Quan; Henderson, Eric

    2000-01-01

    G-DNA is a four-stranded DNA structure with diverse putative biological roles. We have previously purified and cloned a novel G-DNA-binding protein TGP1 from the ciliate Tetrahymena thermophila. Here we report the molecular cloning of TGP3, an additional G-DNA-binding protein from the same organism. The TGP3 cDNA encodes a 365 amino acid protein that is homologous to TGP1 (34% identity and 44% similarity). The proteins share a sequence pattern that contains two novel repetitive and homologous...

  3. EBNA-2 of herpesvirus papio diverges significantly from the type A and type B EBNA-2 proteins of Epstein-Barr virus but retains an efficient transactivation domain with a conserved hydrophobic motif.

    Science.gov (United States)

    Ling, P D; Ryon, J J; Hayward, S D

    1993-01-01

    EBNA-2 contributes to the establishment of Epstein-Barr virus (EBV) latency in B cells and to the resultant alterations in B-cell growth pattern by up-regulating expression from specific viral and cellular promoters. We have taken a comparative approach toward characterizing functional domains within EBNA-2. To this end, we have cloned and sequenced the EBNA-2 gene from the closely related baboon virus herpesvirus papio (HVP). All human EBV isolates have either a type A or type B EBNA-2 gene. However, the HVP EBNA-2 gene falls into neither the type A category nor the type B category, suggesting that the separation into these two subtypes may have been a recent evolutionary event. Comparison of the predicted amino acid sequences indicates 37% amino acid identity with EBV type A EBNA-2 and 35% amino acid identity with type B EBNA-2. To define the domains of EBNA-2 required for transcriptional activation, the DNA binding domain of the GAL4 protein was fused to overlapping segments of EBV EBNA-2. This approach identified a 40-amino-acid (40-aa) EBNA-2 activation domain located between aa 437 and 477. Transactivation ability was completely lost when the amino-terminal boundary of this domain was moved to aa 441, indicating that the motif at aa 437 to 440, Pro-Ile-Leu-Phe, contains residues critical for function. The aa 437 boundary identified in these experiments coincides precisely with a block of conserved sequences in HVP EBNA-2, and the comparable carboxy-terminal region of HVP EBNA-2 also functioned as a strong transcriptional activation domain when fused to the Gal4(1-147) protein. The EBV and HVP EBNA-2 activation domains share a mixed proline-rich, negatively charged character with a striking conservation of positionally equivalent hydrophobic residues. The importance of the individual amino acids making up the Pro-Ile-Leu-Phe motif was examined by mutagenesis. Any alteration of these residues was found to reduce transactivation efficiency, with changes at the

  4. Experimental Rugged Fitness Landscape in Protein Sequence Space

    Science.gov (United States)

    Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

    2006-01-01

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12–130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7×104-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18–24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region. PMID:17183728

  5. Experimental rugged fitness landscape in protein sequence space.

    Science.gov (United States)

    Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

    2006-12-20

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7x10(4)-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18-24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region.

  6. Experimental rugged fitness landscape in protein sequence space.

    Directory of Open Access Journals (Sweden)

    Yuuki Hayashi

    Full Text Available The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7x10(4-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1 the dependence of stationary fitness on library size, which increased gradually, and (2 the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18-24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region.

  7. The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments.

    Science.gov (United States)

    Murvai, J; Vlahovicek, K; Barta, E; Pongor, S

    2001-01-01

    SBASE 8.0 is the eighth release of the SBASE library of protein domain sequences that contains 294 898 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to most major sequence databases and sequence pattern collections. The entries are clustered into over 2005 statistically validated domain groups (SBASE-A) and 595 non-validated groups (SBASE-B), provided with several WWW-based search and browsing facilities for online use. A domain-search facility was developed, based on non-parametric pattern recognition methods, including artificial neural networks. SBASE 8.0 is freely available by anonymous 'ftp' file transfer from ftp.icgeb.trieste.it. Automated searching of SBASE can be carried out with the WWW servers http://www.icgeb.trieste.it/sbase/ and http://sbase.abc. hu/sbase/.

  8. Formation of large viroplasms and virulence of Cauliflower mosaic virus in turnip plants depend on the N-terminal EKI sequence of viral protein TAV.

    Directory of Open Access Journals (Sweden)

    Angèle Geldreich

    Full Text Available Cauliflower mosaic virus (CaMV TAV protein (TransActivator/Viroplasmin plays a pivotal role during the infection cycle since it activates translation reinitiation of viral polycistronic RNAs and suppresses RNA silencing. It is also the major component of cytoplasmic electron-dense inclusion bodies (EDIBs called viroplasms that are particularly evident in cells infected by the virulent CaMV Cabb B-JI isolate. These EDIBs are considered as virion factories, vehicles for CaMV intracellular movement and reservoirs for CaMV transmission by aphids. In this study, focused on different TAV mutants in vivo, we demonstrate that three physically separated domains collectively participate to the formation of large EDIBs: the N-terminal EKI motif, a sequence of the MAV domain involved in translation reinitiation and a C-terminal region encompassing the zinc finger. Surprisingly, EKI mutant TAVm3, corresponding to a substitution of the EKI motif at amino acids 11-13 by three alanines (AAA, which completely abolished the formation of large viroplasms, was not lethal for CaMV but highly reduced its virulence without affecting the rate of systemic infection. Expression of TAVm3 in a viral context led to formation of small irregularly shaped inclusion bodies, mild symptoms and low levels of viral DNA and particles accumulation, despite the production of significant amounts of mature capsid proteins. Unexpectedly, for CaMV-TAVm3 the formation of viral P2-containing electron-light inclusion body (ELIB, which is essential for CaMV aphid transmission, was also altered, thus suggesting an indirect role of the EKI tripeptide in CaMV plant-to-plant propagation. This important functional contribution of the EKI motif in CaMV biology can explain the strict conservation of this motif in the TAV sequences of all CaMV isolates.

  9. Arabidopsis ASYMMETRIC LEAVES2 protein required for leaf morphogenesis consistently forms speckles during mitosis of tobacco BY-2 cells via signals in its specific sequence.

    Science.gov (United States)

    Luo, Lilan; Ando, Sayuri; Sasabe, Michiko; Machida, Chiyoko; Kurihara, Daisuke; Higashiyama, Tetsuya; Machida, Yasunori

    2012-09-01

    Leaf primordia with high division and developmental competencies are generated around the periphery of stem cells at the shoot apex. Arabidopsis ASYMMETRIC-LEAVES2 (AS2) protein plays a key role in the regulation of many genes responsible for flat symmetric leaf formation. The AS2 gene, expressed in leaf primordia, encodes a plant-specific nuclear protein containing an AS2/LOB domain with cysteine repeats (C-motif). AS2 proteins are present in speckles in and around the nucleoli, and in the nucleoplasm of some leaf epidermal cells. We used the tobacco cultured cell line BY-2 expressing the AS2-fused yellow fluorescent protein to examine subnuclear localization of AS2 in dividing cells. AS2 mainly localized to speckles (designated AS2 bodies) in cells undergoing mitosis and distributed in a pairwise manner during the separation of sets of daughter chromosomes. Few interphase cells contained AS2 bodies. Deletion analyses showed that a short stretch of the AS2 amino-terminal sequence and the C-motif play negative and positive roles, respectively, in localizing AS2 to the bodies. These results suggest that AS2 bodies function to properly distribute AS2 to daughter cells during cell division in leaf primordia; and this process is controlled at least partially by signals encoded by the AS2 sequence itself.

  10. Sequence-specific capture of protein-DNA complexes for mass spectrometric protein identification.

    Directory of Open Access Journals (Sweden)

    Cheng-Hsien Wu

    Full Text Available The regulation of gene transcription is fundamental to the existence of complex multicellular organisms such as humans. Although it is widely recognized that much of gene regulation is controlled by gene-specific protein-DNA interactions, there presently exists little in the way of tools to identify proteins that interact with the genome at locations of interest. We have developed a novel strategy to address this problem, which we refer to as GENECAPP, for Global ExoNuclease-based Enrichment of Chromatin-Associated Proteins for Proteomics. In this approach, formaldehyde cross-linking is employed to covalently link DNA to its associated proteins; subsequent fragmentation of the DNA, followed by exonuclease digestion, produces a single-stranded region of the DNA that enables sequence-specific hybridization capture of the protein-DNA complex on a solid support. Mass spectrometric (MS analysis of the captured proteins is then used for their identification and/or quantification. We show here the development and optimization of GENECAPP for an in vitro model system, comprised of the murine insulin-like growth factor-binding protein 1 (IGFBP1 promoter region and FoxO1, a member of the forkhead rhabdomyosarcoma (FoxO subfamily of transcription factors, which binds specifically to the IGFBP1 promoter. This novel strategy provides a powerful tool for studies of protein-DNA and protein-protein interactions.

  11. Identification of helix capping and {beta}-turn motifs from NMR chemical shifts

    Energy Technology Data Exchange (ETDEWEB)

    Shen Yang; Bax, Ad, E-mail: bax@nih.gov [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)

    2012-03-15

    We present an empirical method for identification of distinct structural motifs in proteins on the basis of experimentally determined backbone and {sup 13}C{sup {beta}} chemical shifts. Elements identified include the N-terminal and C-terminal helix capping motifs and five types of {beta}-turns: I, II, I Prime , II Prime and VIII. Using a database of proteins of known structure, the NMR chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and {beta}-turn motifs are used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein. The trained neural networks, contained in the MICS (motif identification from chemical shifts) program, also provide a confidence level for each of their predictions, and values ranging from ca 0.7-0.9 for the Matthews correlation coefficient of its predictions far exceed those attainable by sequence analysis. MICS is anticipated to be useful both in the conventional NMR structure determination process and for enhancing on-going efforts to determine protein structures solely on the basis of chemical shift information, where it can aid in identifying protein database fragments suitable for use in building such structures.

  12. Identification of helix capping and β-turn motifs from NMR chemical shifts

    International Nuclear Information System (INIS)

    Shen Yang; Bax, Ad

    2012-01-01

    We present an empirical method for identification of distinct structural motifs in proteins on the basis of experimentally determined backbone and 13 C β chemical shifts. Elements identified include the N-terminal and C-terminal helix capping motifs and five types of β-turns: I, II, I′, II′ and VIII. Using a database of proteins of known structure, the NMR chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and β-turn motifs are used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein. The trained neural networks, contained in the MICS (motif identification from chemical shifts) program, also provide a confidence level for each of their predictions, and values ranging from ca 0.7–0.9 for the Matthews correlation coefficient of its predictions far exceed those attainable by sequence analysis. MICS is anticipated to be useful both in the conventional NMR structure determination process and for enhancing on-going efforts to determine protein structures solely on the basis of chemical shift information, where it can aid in identifying protein database fragments suitable for use in building such structures.

  13. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  14. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  15. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  16. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Science.gov (United States)

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  17. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Directory of Open Access Journals (Sweden)

    Pooya Zandevakili

    Full Text Available Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  18. Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences.

    Directory of Open Access Journals (Sweden)

    Alexander M Sevy

    2015-07-01

    Full Text Available Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a 'single state' design (SSD paradigm. Multi-specificity design (MSD, on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON. The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design "promiscuous", polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes.

  19. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    Science.gov (United States)

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  20. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    Science.gov (United States)

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  1. Plasma membrane translocation of a protein needle based on a triple-stranded β-helix motif.

    Science.gov (United States)

    Sanghamitra, Nusrat J M; Inaba, Hiroshi; Arisaka, Fumio; Ohtan Wang, Dan; Kanamaru, Shuji; Kitagawa, Susumu; Ueno, Takafumi

    2014-10-01

    Plasma membrane translocation is challenging due to the barrier of the cell membrane. Contrary to the synthetic cell-penetrating materials, tailed bacteriophages use cell-puncturing protein needles to puncture the cell membranes as an initial step of the DNA injection process. Cell-puncturing protein needles are thought to remain functional in the native phages. In this paper, we found that a bacteriophage T4 derived protein needle of 16 nm length spontaneously translocates through the living cell membrane. The β-helical protein needle (β-PN) internalizes into human red blood cells that lack endocytic machinery. By comparing the cellular uptake of β-PNs with modified surface charge, it is shown that the uptake efficiency is maximum when it has a negative charge corresponding to a zeta potential value of -16 mV. In HeLa cells, uptake of β-PN incorporates endocytosis independent mechanisms with partial macropinocytosis dependence. The endocytosis dependence of the uptake increases when the surface charges of β-PNs are modified to positive or negative. Thus, these results suggest that natural DNA injecting machinery can serve as an inspiration to design new class of cell-penetrating materials with a tailored mechanism.

  2. Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence

    NARCIS (Netherlands)

    Al-Shahib, A.; Breitling, R.; Gilbert, D.

    2005-01-01

    Abstract: When the standard approach to predict protein function by sequence homology fails, other alternative methods can be used that require only the amino acid sequence for predicting function. One such approach uses machine learning to predict protein function directly from amino acid sequence

  3. Designing sequence to control protein function in an EF-hand protein.

    Science.gov (United States)

    Bunick, Christopher G; Nelson, Melanie R; Mangahas, Sheryll; Hunter, Michael J; Sheehan, Jonathan H; Mizoue, Laura S; Bunick, Gerard J; Chazin, Walter J

    2004-05-19

    The extent of conformational change that calcium binding induces in EF-hand proteins is a key biochemical property specifying Ca(2+) sensor versus signal modulator function. To understand how differences in amino acid sequence lead to differences in the response to Ca(2+) binding, comparative analyses of sequence and structures, combined with model building, were used to develop hypotheses about which amino acid residues control Ca(2+)-induced conformational changes. These results were used to generate a first design of calbindomodulin (CBM-1), a calbindin D(9k) re-engineered with 15 mutations to respond to Ca(2+) binding with a conformational change similar to that of calmodulin. The gene for CBM-1 was synthesized, and the protein was expressed and purified. Remarkably, this protein did not exhibit any non-native-like molten globule properties despite the large number of mutations and the nonconservative nature of some of them. Ca(2+)-induced changes in CD intensity and in the binding of the hydrophobic probe, ANS, implied that CBM-1 does undergo Ca(2+) sensorlike conformational changes. The X-ray crystal structure of Ca(2+)-CBM-1 determined at 1.44 A resolution reveals the anticipated increase in hydrophobic surface area relative to the wild-type protein. A nascent calmodulin-like hydrophobic docking surface was also found, though it is occluded by the inter-EF-hand loop. The results from this first calbindomodulin design are discussed in terms of progress toward understanding the relationships between amino acid sequence, protein structure, and protein function for EF-hand CaBPs, as well as the additional mutations for the next CBM design.

  4. Efficient sequential and parallel algorithms for planted motif search.

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2014-01-31

    Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

  5. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    Science.gov (United States)

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  6. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-01-01

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment

  7. Convulxin, a C-type lectin-like protein, inhibits HCASMCs functions via WAD-motif/integrin-αv interaction and NF-κB-independent gene suppression of GRO and IL-8.

    Science.gov (United States)

    Shih, Chun-Ho; Chiang, Tin-Bin; Wang, Wen-Jeng

    2017-03-15

    Convulxin (CVX), a C-type lectin-like protein (CLPs), is a potent platelet aggregation inducer. To evaluate its potential applications in angiogenic diseases, the multimeric CVX were further explored on its mode of actions toward human coronary artery smooth muscle cells (HCASMCs). The N-terminus of β-chain of CVX (CVX-β) contains a putative disintegrin-like domain with a conserved motif upon the sequence comparison with other CLPs. Importantly, native CVX had no cytotoxic activity as examined by electrophoretic pattern. A Trp-Ala-Asp (WAD)-containing octapeptide, MTWADAEK, was thereafter synthesized and analyzed in functional assays. In the case of specific integrin antagonists as positive controls, the anti-angiogenic effects of CVX on HCASMCs were investigated by series of functional analyses. CVX showed to exhibit multiple inhibitory activities toward HCASMCs proliferation, adhesion and invasion with a dose- and integrin αvβ3-dependent fashion. However, the WAD-octapeptide exerting a minor potency could also work as an active peptidomimetic. In addition, flow cytometric analysis demonstrated both the intact CVX and synthetic peptide can specifically interact with integrin-αv on HCASMCs and CVX was shown to have a down-regulatory effect on the gene expression of CXC-chemokines, such as growth-related oncogene and interleukin-8. According to nuclear factor-κB (NF-κB) p65 translocation assay and Western blotting analysis, the NF-κB activation was not involved in the signaling events of CVX-induced gene expression. In conclusion, CVX may act as a disintegrin-like protein via the interactions of WAD-motif in CVX-β with integrin-αv on HCASMCs and it also is a gene suppressor with the ability to diminish the expression of two CXC-chemokines in a NF-κB-independent manner. Indeed, more extensive investigations are needed and might create a new avenue for the development of a novel angiostatic agent. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Identification and characterization of a selenoprotein family containing a diselenide bond in a redox motif

    OpenAIRE

    Shchedrina, Valentina A.; Novoselov, Sergey V.; Malinouski, Mikalai Yu.; Gladyshev, Vadim N.

    2007-01-01

    Selenocysteine (Sec, U) insertion into proteins is directed by translational recoding of specific UGA codons located upstream of a stem-loop structure known as Sec insertion sequence (SECIS) element. Selenoproteins with known functions are oxidoreductases containing a single redox-active Sec in their active sites. In this work, we identified a family of selenoproteins, designated SelL, containing two Sec separated by two other residues to form a UxxU motif. SelL proteins show an unusual occur...

  9. Hybrids of the bHLH and bZIP protein motifs display different DNA-binding activities in vivo vs. in vitro.

    Directory of Open Access Journals (Sweden)

    Hiu-Kwan Chow

    Full Text Available Minimalist hybrids comprising the DNA-binding domain of bHLH/PAS (basic-helix-loop-helix/Per-Arnt-Sim protein Arnt fused to the leucine zipper (LZ dimerization domain from bZIP (basic region-leucine zipper protein C/EBP were designed to bind the E-box DNA site, CACGTG, targeted by bHLHZ (basic-helix-loop-helix-zipper proteins Myc and Max, as well as the Arnt homodimer. The bHLHZ-like structure of ArntbHLH-C/EBP comprises the Arnt bHLH domain fused to the C/EBP LZ: i.e. swap of the 330 aa PAS domain for the 29 aa LZ. In the yeast one-hybrid assay (Y1H, transcriptional activation from the E-box was strong by ArntbHLH-C/EBP, and undetectable for the truncated ArntbHLH (PAS removed, as detected via readout from the HIS3 and lacZ reporters. In contrast, fluorescence anisotropy titrations showed affinities for the E-box with ArntbHLH-C/EBP and ArntbHLH comparable to other transcription factors (K(d 148.9 nM and 40.2 nM, respectively, but only under select conditions that maintained folded protein. Although in vivo yeast results and in vitro spectroscopic studies for ArntbHLH-C/EBP targeting the E-box correlate well, the same does not hold for ArntbHLH. As circular dichroism confirms that ArntbHLH-C/EBP is a much more strongly alpha-helical structure than ArntbHLH, we conclude that the nonfunctional ArntbHLH in the Y1H must be due to misfolding, leading to the false negative that this protein is incapable of targeting the E-box. Many experiments, including protein design and selections from large libraries, depend on protein domains remaining well-behaved in the nonnative experimental environment, especially small motifs like the bHLH (60-70 aa. Interestingly, a short helical LZ can serve as a folding- and/or solubility-enhancing tag, an important device given the focus of current research on exploration of vast networks of biomolecular interactions.

  10. A scalable double-barcode sequencing platform for characterization of dynamic protein-protein interactions.

    Science.gov (United States)

    Schlecht, Ulrich; Liu, Zhimin; Blundell, Jamie R; St Onge, Robert P; Levy, Sasha F

    2017-05-25

    Several large-scale efforts have systematically catalogued protein-protein interactions (PPIs) of a cell in a single environment. However, little is known about how the protein interactome changes across environmental perturbations. Current technologies, which assay one PPI at a time, are too low throughput to make it practical to study protein interactome dynamics. Here, we develop a highly parallel protein-protein interaction sequencing (PPiSeq) platform that uses a novel double barcoding system in conjunction with the dihydrofolate reductase protein-fragment complementation assay in Saccharomyces cerevisiae. PPiSeq detects PPIs at a rate that is on par with current assays and, in contrast with current methods, quantitatively scores PPIs with enough accuracy and sensitivity to detect changes across environments. Both PPI scoring and the bulk of strain construction can be performed with cell pools, making the assay scalable and easily reproduced across environments. PPiSeq is therefore a powerful new tool for large-scale investigations of dynamic PPIs.

  11. Sequence Classification - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ansmembrane helical proteins by applying statistical and machine learning methods to each amino acid sequenc.... Amino Acid Result of predicting β-barrel membrane protein with a statistical method using amino acid compo...sition. ( TMBETADISC-COMP ) Dipeptide Result of predicting β-barrel membrane protein with a statistic...ting β-barrel membrane protein with a statistical method using motifs. ( TMBETADISC-MOTIF ) SVM Result of pr

  12. RegRNA: an integrated web server for identifying regulatory RNA motifs and elements

    OpenAIRE

    Huang, Hsi-Yuan; Chien, Chia-Hung; Jen, Kuan-Hua; Huang, Hsien-Da

    2006-01-01

    Numerous regulatory structural motifs have been identified as playing essential roles in transcriptional and post-transcriptional regulation of gene expression. RegRNA is an integrated web server for identifying the homologs of regulatory RNA motifs and elements against an input mRNA sequence. Both sequence homologs and structural homologs of regulatory RNA motifs can be recognized. The regulatory RNA motifs supported in RegRNA are categorized into several classes: (i) motifs in mRNA 5′-untra...

  13. SPiCE : A web-based tool for sequence-based protein classification and exploration

    NARCIS (Netherlands)

    Van den Berg, B.A.; Reinders, M.J.; Roubos, J.A.; De Ridder, D.

    2014-01-01

    Background Amino acid sequences and features extracted from such sequences have been used to predict many protein properties, such as subcellular localization or solubility, using classifier algorithms. Although software tools are available for both feature extraction and classifier construction,

  14. Identification of a putative nuclear export signal motif in human NANOG homeobox domain

    International Nuclear Information System (INIS)

    Park, Sung-Won; Do, Hyun-Jin; Huh, Sun-Hyung; Sung, Boreum; Uhm, Sang-Jun; Song, Hyuk; Kim, Nam-Hyung; Kim, Jae-Hwan

    2012-01-01

    Highlights: ► We found the putative nuclear export signal motif within human NANOG homeodomain. ► Leucine-rich residues are important for human NANOG homeodomain nuclear export. ► CRM1-specific inhibitor LMB blocked the potent human NANOG NES-mediated nuclear export. -- Abstract: NANOG is a homeobox-containing transcription factor that plays an important role in pluripotent stem cells and tumorigenic cells. To understand how nuclear localization of human NANOG is regulated, the NANOG sequence was examined and a leucine-rich nuclear export signal (NES) motif ( 125 MQELSNILNL 134 ) was found in the homeodomain (HD). To functionally validate the putative NES motif, deletion and site-directed mutants were fused to an EGFP expression vector and transfected into COS-7 cells, and the localization of the proteins was examined. While hNANOG HD exclusively localized to the nucleus, a mutant with both NLSs deleted and only the putative NES motif contained (hNANOG HD-ΔNLSs) was predominantly cytoplasmic, as observed by nucleo/cytoplasmic fractionation and Western blot analysis as well as confocal microscopy. Furthermore, site-directed mutagenesis of the putative NES motif in a partial hNANOG HD only containing either one of the two NLS motifs led to localization in the nucleus, suggesting that the NES motif may play a functional role in nuclear export. Furthermore, CRM1-specific nuclear export inhibitor LMB blocked the hNANOG potent NES-mediated export, suggesting that the leucine-rich motif may function in CRM1-mediated nuclear export of hNANOG. Collectively, a NES motif is present in the hNANOG HD and may be functionally involved in CRM1-mediated nuclear export pathway.

  15. Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

    Science.gov (United States)

    Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

    Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.

  16. Formation of a Multiple Protein Complex on the Adenovirus Packaging Sequence by the IVa2 Protein▿

    OpenAIRE

    Tyler, Ryan E.; Ewing, Sean G.; Imperiale, Michael J.

    2007-01-01

    During adenovirus virion assembly, the packaging sequence mediates the encapsidation of the viral genome. This sequence is composed of seven functional units, termed A repeats. Recent evidence suggests that the adenovirus IVa2 protein binds the packaging sequence and is involved in packaging of the genome. Study of the IVa2-packaging sequence interaction has been hindered by difficulty in purifying the protein produced in virus-infected cells or by recombinant techniques. We report the first ...

  17. MIPS: a database for protein sequences, homology data and yeast genome information.

    Science.gov (United States)

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  18. Mutational analysis of the RecJ exonuclease of Escherichia coli: identification of phosphoesterase motifs.

    Science.gov (United States)

    Sutera, V A; Han, E S; Rajman, L A; Lovett, S T

    1999-10-01

    The recJ gene, identified in Escherichia coli, encodes a Mg(+2)-dependent 5'-to-3' exonuclease with high specificity for single-strand DNA. Genetic and biochemical experiments implicate RecJ exonuclease in homologous recombination, base excision, and methyl-directed mismatch repair. Genes encoding proteins with strong similarities to RecJ have been found in every eubacterial genome sequenced to date, with the exception of Mycoplasma and Mycobacterium tuberculosis. Multiple genes encoding proteins similar to RecJ are found in some eubacteria, including Bacillus and Helicobacter, and in the archaea. Among this divergent set of sequences, seven conserved motifs emerge. We demonstrate here that amino acids within six of these motifs are essential for both the biochemical and genetic functions of E. coli RecJ. These motifs may define interactions with Mg(2+) ions or substrate DNA. A large family of proteins more distantly related to RecJ is present in archaea, eubacteria, and eukaryotes, including a hypothetical protein in the MgPa adhesin operon of Mycoplasma, a domain of putative polyA polymerases in Synechocystis and Aquifex, PRUNE of Drosophila, and an exopolyphosphatase (PPX1) of Saccharomyces cereviseae. Because these six RecJ motifs are shared between exonucleases and exopolyphosphatases, they may constitute an ancient phosphoesterase domain now found in all kingdoms of life.

  19. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    Directory of Open Access Journals (Sweden)

    Kevin R Ramkissoon

    Full Text Available The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  20. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    Science.gov (United States)

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  1. Litopenaeus vannamei sterile-alpha and armadillo motif containing protein (LvSARM is involved in regulation of Penaeidins and antilipopolysaccharide factors.

    Directory of Open Access Journals (Sweden)

    Pei-Hui Wang

    Full Text Available The Toll-like receptor (TLR-mediated NF-κB pathway is tightly controlled because overactivation may result in severe damage to the host, such as in the case of chronic inflammatory diseases and cancer. In mammals, sterile-alpha and armadillo motif-containing protein (SARM plays an important role in negatively regulating this pathway. While Caenorhabditis elegans SARM is crucial for an efficient immune response against bacterial and fungal infections, it is still unknown whether Drosophila SARM participates in immune responses. Here, Litopenaeus vannamei SARM (LvSARM was cloned and functionally characterized. LvSARM shared signature domains with and exhibited significant similarities to mammalian SARM. Real-time quantitative PCR analysis indicated that the expression of LvSARM was responsive to Vibrio alginolyticus and white spot syndrome virus (WSSV infections in the hemocyte, gill, hepatopancreas and intestine. In Drosophila S2 cells, LvSARM was widely distributed in the cytoplasm and could significantly inhibit the promoters of the NF-κB pathway-controlled antimicrobial peptide genes (AMPs. Silencing of LvSARM using dsRNA-mediated RNA interference increased the expression levels of Penaeidins and antilipopolysaccharide factors, which are L.vannamei AMPs, and increased the mortality rate after V. alginolyticus infection. Taken together, our results reveal that LvSARM may be a novel component of the shrimp Toll pathway that negatively regulates shrimp AMPs, particularly Penaeidins and antilipopolysaccharide factors.

  2. A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

    Science.gov (United States)

    Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

    2017-06-23

    The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass

  3. The adeno-associated virus major regulatory protein Rep78-c-Jun-DNA motif complex modulates AP-1 activity

    International Nuclear Information System (INIS)

    Prasad, C. Krishna; Meyers, Craig; Zhan Dejin; You Hong; Chiriva-Internati, Maurizio; Mehta, Jawahar L.; Liu Yong; Hermonat, Paul L.

    2003-01-01

    Multiple epidemiologic studies show that adeno-associated virus (AAV) is negatively associated with cervical cancer (CX CA), a cancer which is positively associated with human papillomavirus (HPV) infection. Mechanisms for this correlation may be by Rep78's (AAV's major regulatory protein) ability to bind the HPV-16 p97 promoter DNA and inhibit transcription, to bind and interfere with the functions of the E7 oncoprotein of HPV-16, and to bind a variety of HPV-important cellular transcription factors such as Sp1 and TBP. c-Jun is another important cellular factor intimately linked to the HPV life cycle, as well as keratinocyte differentiation and skin development. Skin is the natural host tissue for both HPV and AAV. In this article it is demonstrated that Rep78 directly interacts with c-Jun, both in vitro and in vivo, as analyzed by Western blot, yeast two-hybrid cDNA, and electrophoretic mobility shift-supershift assay (EMSA supershift). Addition of anti-Rep78 antibodies inhibited the EMSA supershift. Investigating the biological implications of this interaction, Rep78 inhibited the c-Jun-dependent c-jun promoter in transient and stable chloramphenicol acetyl-transferase (CAT) assays. Rep78 also inhibited c-Jun-augmented c-jun promoter as well as the HPV-16 p97 promoter activity (also c-Jun regulated) in in vitro transcription assays in T47D nuclear extracts. Finally, the Rep78-c-Jun interaction mapped to the amino-half of Rep78. The ability of Rep78 to interact with c-Jun and down-regulate AP-1-dependent transcription suggests one more mechanism by which AAV may modulate the HPV life cycle and the carcinogenesis process

  4. A synthetic peptide with the putative iron binding motif of amyloid precursor protein (APP does not catalytically oxidize iron.

    Directory of Open Access Journals (Sweden)

    Kourosh Honarmand Ebrahimi

    Full Text Available The β-amyloid precursor protein (APP, which is a key player in Alzheimer's disease, was recently reported to possess an Fe(II binding site within its E2 domain which exhibits ferroxidase activity [Duce et al. 2010, Cell 142: 857]. The putative ligands of this site were compared to those in the ferroxidase site of ferritin. The activity was indirectly measured using transferrin, which scavenges the Fe(III product of the reaction. A 22-residue synthetic peptide, named FD1, with the putative ferroxidase site of APP, and the E2 domain of APP were each reported to exhibit 40% of the ferroxidase activity of APP and of ceruloplasmin. It was also claimed that the ferroxidase activity of APP is inhibited by Zn(II just as in ferritin. We measured the ferroxidase activity indirectly (i by the incorporation of the Fe(III product of the ferroxidase reaction into transferrin and directly (ii by monitoring consumption of the substrate molecular oxygen. The results with the FD1 peptide were compared to the established ferroxidase activities of human H-chain ferritin and of ceruloplasmin. For FD1 we observed no activity above the background of non-enzymatic Fe(II oxidation by molecular oxygen. Zn(II binds to transferrin and diminishes its Fe(III incorporation capacity and rate but it does not specifically bind to a putative ferroxidase site of FD1. Based on these results, and on comparison of the putative ligands of the ferroxidase site of APP with those of ferritin, we conclude that the previously reported results for ferroxidase activity of FD1 and - by implication - of APP should be re-evaluated.

  5. Elman RNN based classification of proteins sequences on account of their mutual information.

    Science.gov (United States)

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. [Regulatory effect and mechanism of RNA binding motif protein 38 on the expression of progesterone receptor in human breast cancer ZR-75-1 cells].

    Science.gov (United States)

    Lou, P P; Li, C L; Xia, T S; Shi, L; Wu, J; Zhou, X J; Wang, Y; Ding, Q

    2016-06-23

    To investigate the regulatory mechanism of RNA binding motif protein 38 (RNPC1) on the expression of progesterone receptor (PR) in breast cancer cell line ZR-75-1. Lentiviral vector was used to induce overexpression of RNPC1 in ZR-75-1 cells. qRT-PCR and Western blot were used to assess the regulatory effect of RNPC1 on PR expression. Actinomycin was used to detect the regulatory mechanism involved. Immunohistochemical (IHC) staining was used to determine the protein expression of RNPC1 and PR in 80 breast cancer tissues. IHC staining showed that the expression of RNPC1 was significantly higher in the PR positive breast cancer tissues than that in the PR negative breast cancer tissues (P<0.05). The qRT-PCR results showed that overexpression of RNPC1 in ZR-75-1 cells significantly upregulated the mRNA level of PR (1.764±0.028 vs. 1.001±0.037, P<0.01), whereas knockdown of RNPC1 did the opposite (0.579± 0.007 vs. 1.000±0.002, P<0.01). The Western blot results also showed that overexpression of RNPC1 up-regulated PR levels, while knockdown of RNPC1 resulted in down-regulation of PR levels in the ZR-75-1 cells.The actinomycin assay showed that overexpression of RNPC1 increased the mRNA stability of PR. The half-life of PR mRNA was increased from 4.0 h to 6.5 h. Knockdown of RNPC1 decreased the mRNA stability of PR and the half-life of PR transcript was decreased from 4.1 h to 3.0 h. RNPC1 plays a crucial role in regulating the expression of PR in breast cancer ZR-75-1 cells.

  7. JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture

    Directory of Open Access Journals (Sweden)

    Pagni Marco

    2005-08-01

    Full Text Available Abstract Background Whole-genome sequencing projects are rapidly producing an enormous number of new sequences. Consequently almost every family of proteins now contains hundreds of members. It has thus become necessary to develop tools, which classify protein sequences automatically and also quickly and reliably. The difficulty of this task is intimately linked to the mechanism by which protein sequences diverge, i.e. by simultaneous residue substitutions, insertions and/or deletions and whole domain reorganisations (duplications/swapping/fusion. Results Here we present a novel approach, which is based on random sampling of sub-sequences (probes out of a set of input sequences. The probes are compared to the input sequences, after a normalisation step; the results are used to partition the input sequences into homogeneous groups of proteins. In addition, this method provides information on diagnostic parts of the proteins. The performance of this method is challenged by two data sets. The first one contains the sequences of prokaryotic lyases that could be arranged as a multiple sequence alignment. The second one contains all proteins from Swiss-Prot Release 36 with at least one Src homology 2 (SH2 domain – a classical example for proteins with modular architecture. Conclusion The outcome of our method is robust, highly reproducible as shown using bootstrap and resampling validation procedures. The results are essentially coherent with the biology. This method depends solely on well-established publicly available software and algorithms.

  8. Analysis of long-range correlation in sequences data of proteins

    OpenAIRE

    ADRIANA ISVORAN; LAURA UNIPAN; DANA CRACIUN; VASILE MORARIU

    2007-01-01

    The results presented here suggest the existence of correlations in the sequence data of proteins. 32 proteins, both globular and fibrous, both monomeric and polymeric, were analyzed. The primary structures of these proteins were treated as time series. Three spatial series of data for each sequence of a protein were generated from numerical correspondences between each amino acid and a physical property associated with it, i.e., its electric charge, its polar character and its dipole moment....

  9. Sequence-based prediction of protein protein interaction using a deep-learning algorithm.

    Science.gov (United States)

    Sun, Tanlin; Zhou, Bo; Lai, Luhua; Pei, Jianfeng

    2017-05-25

    Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested. We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods. To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field.

  10. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    Directory of Open Access Journals (Sweden)

    Yunyun Liang

    2015-01-01

    Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature ext