WorldWideScience

Sample records for protein structural prediction

  1. Protein Structure Prediction by Protein Threading

    Science.gov (United States)

    Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

    The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

  2. Neural Networks for protein Structure Prediction

    DEFF Research Database (Denmark)

    Bohr, Henrik

    1998-01-01

    This is a review about neural network applications in bioinformatics. Especially the applications to protein structure prediction, e.g. prediction of secondary structures, prediction of surface structure, fold class recognition and prediction of the 3-dimensional structure of protein backbones...

  3. Algorithms for Protein Structure Prediction

    DEFF Research Database (Denmark)

    Paluszewski, Martin

    -trace. Here we present three different approaches for reconstruction of C-traces from predictable measures. In our first approach [63, 62], the C-trace is positioned on a lattice and a tabu-search algorithm is applied to find minimum energy structures. The energy function is based on half-sphere-exposure (HSE......) is more robust than standard Monte Carlo search. In the second approach for reconstruction of C-traces, an exact branch and bound algorithm has been developed [67, 65]. The model is discrete and makes use of secondary structure predictions, HSE, CN and radius of gyration. We show how to compute good lower...... bounds for partial structures very fast. Using these lower bounds, we are able to find global minimum structures in a huge conformational space in reasonable time. We show that many of these global minimum structures are of good quality compared to the native structure. Our branch and bound algorithm...

  4. Protein secondary structure: category assignment and predictability

    DEFF Research Database (Denmark)

    Andersen, Claus A.; Bohr, Henrik; Brunak, Søren

    2001-01-01

    In the last decade, the prediction of protein secondary structure has been optimized using essentially one and the same assignment scheme known as DSSP. We present here a different scheme, which is more predictable. This scheme predicts directly the hydrogen bonds, which stabilize the secondary......-forward neural network with one hidden layer on a data set identical to the one used in earlier work....

  5. A Kernel for Protein Secondary Structure Prediction

    OpenAIRE

    Guermeur , Yann; Lifchitz , Alain; Vert , Régis

    2004-01-01

    http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10338&mode=toc; International audience; Multi-class support vector machines have already proved efficient in protein secondary structure prediction as ensemble methods, to combine the outputs of sets of classifiers based on different principles. In this chapter, their implementation as basic prediction methods, processing the primary structure or the profile of multiple alignments, is investigated. A kernel devoted to the task is in...

  6. Predicting Protein Secondary Structure with Markov Models

    DEFF Research Database (Denmark)

    Fischer, Paul; Larsen, Simon; Thomsen, Claus

    2004-01-01

    we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....

  7. Protein structure based prediction of catalytic residues.

    Science.gov (United States)

    Fajardo, J Eduardo; Fiser, Andras

    2013-02-22

    Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

  8. Mapping monomeric threading to protein-protein structure prediction.

    Science.gov (United States)

    Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang

    2013-03-25

    The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.

  9. Alpha complexes in protein structure prediction

    DEFF Research Database (Denmark)

    Winter, Pawel; Fonseca, Rasmus

    2015-01-01

    Reducing the computational effort and increasing the accuracy of potential energy functions is of utmost importance in modeling biological systems, for instance in protein structure prediction, docking or design. Evaluating interactions between nonbonded atoms is the bottleneck of such computations......-complexes from scratch for every configuration encountered during the search for the native structure would make this approach hopelessly slow. However, it is argued that kinetic a-complexes can be used to reduce the computational effort of determining the potential energy when "moving" from one configuration...... to a neighboring one. As a consequence, relatively expensive (initial) construction of an a-complex is expected to be compensated by subsequent fast kinetic updates during the search process. Computational results presented in this paper are limited. However, they suggest that the applicability of a...

  10. Cloud prediction of protein structure and function with PredictProtein for Debian.

    Science.gov (United States)

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  11. PSPP: a protein structure prediction pipeline for computing clusters.

    Directory of Open Access Journals (Sweden)

    Michael S Lee

    2009-07-01

    Full Text Available Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML formats. So far, the pipeline has been used to study viral and bacterial proteomes.The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform

  12. Predicting nucleic acid binding interfaces from structural models of proteins.

    Science.gov (United States)

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  13. EVA: continuous automatic evaluation of protein structure prediction servers.

    Science.gov (United States)

    Eyrich, V A; Martí-Renom, M A; Przybylski, D; Madhusudhan, M S; Fiser, A; Pazos, F; Valencia, A; Sali, A; Rost, B

    2001-12-01

    Evaluation of protein structure prediction methods is difficult and time-consuming. Here, we describe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods. http://cubic.bioc.columbia.edu/eva. eva@cubic.bioc.columbia.edu

  14. Is protein structure prediction still an enigma?

    African Journals Online (AJOL)

    STORAGESEVER

    2008-12-29

    Dec 29, 2008 ... Computer methods for protein analysis address this problem since they study the .... neighbor methods, molecular dynamic simulation, and approaches .... fuzzy clustering, neural net works, logistic regression, decision tree ...

  15. Critical Features of Fragment Libraries for Protein Structure Prediction.

    Science.gov (United States)

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  16. Blind Test of Physics-Based Prediction of Protein Structures

    Science.gov (United States)

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  17. Protein structure prediction using bee colony optimization metaheuristic

    DEFF Research Database (Denmark)

    Fonseca, Rasmus; Paluszewski, Martin; Winter, Pawel

    2010-01-01

    of the proteins structure, an energy potential and some optimization algorithm that ¿nds the structure with minimal energy. Bee Colony Optimization (BCO) is a relatively new approach to solving opti- mization problems based on the foraging behaviour of bees. Several variants of BCO have been suggested......Predicting the native structure of proteins is one of the most challenging problems in molecular biology. The goal is to determine the three-dimensional struc- ture from the one-dimensional amino acid sequence. De novo prediction algorithms seek to do this by developing a representation...... our BCO method to generate good solutions to the protein structure prediction problem. The results show that BCO generally ¿nds better solutions than simulated annealing which so far has been the metaheuristic of choice for this problem....

  18. Predicting and validating protein interactions using network structure.

    Directory of Open Access Journals (Sweden)

    Pao-Yang Chen

    2008-07-01

    Full Text Available Protein interactions play a vital part in the function of a cell. As experimental techniques for detection and validation of protein interactions are time consuming, there is a need for computational methods for this task. Protein interactions appear to form a network with a relatively high degree of local clustering. In this paper we exploit this clustering by suggesting a score based on triplets of observed protein interactions. The score utilises both protein characteristics and network properties. Our score based on triplets is shown to complement existing techniques for predicting protein interactions, outperforming them on data sets which display a high degree of clustering. The predicted interactions score highly against test measures for accuracy. Compared to a similar score derived from pairwise interactions only, the triplet score displays higher sensitivity and specificity. By looking at specific examples, we show how an experimental set of interactions can be enriched and validated. As part of this work we also examine the effect of different prior databases upon the accuracy of prediction and find that the interactions from the same kingdom give better results than from across kingdoms, suggesting that there may be fundamental differences between the networks. These results all emphasize that network structure is important and helps in the accurate prediction of protein interactions. The protein interaction data set and the program used in our analysis, and a list of predictions and validations, are available at http://www.stats.ox.ac.uk/bioinfo/resources/PredictingInteractions.

  19. Combining neural networks for protein secondary structure prediction

    DEFF Research Database (Denmark)

    Riis, Søren Kamaric

    1995-01-01

    In this paper structured neural networks are applied to the problem of predicting the secondary structure of proteins. A hierarchical approach is used where specialized neural networks are designed for each structural class and then combined using another neural network. The submodels are designed...... by using a priori knowledge of the mapping between protein building blocks and the secondary structure and by using weight sharing. Since none of the individual networks have more than 600 adjustable weights over-fitting is avoided. When ensembles of specialized experts are combined the performance...

  20. Predicting protein structures with a multiplayer online game

    OpenAIRE

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran

    2010-01-01

    People exert significant amounts of problem solving effort playing computer games. Simple image- and text-recognition tasks have been successfully crowd-sourced through gamesi, ii, iii, but it is not clear if more complex scientific problems can be similarly solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search sp...

  1. Contingency Table Browser - prediction of early stage protein structure.

    Science.gov (United States)

    Kalinowska, Barbara; Krzykalski, Artur; Roterman, Irena

    2015-01-01

    The Early Stage (ES) intermediate represents the starting structure in protein folding simulations based on the Fuzzy Oil Drop (FOD) model. The accuracy of FOD predictions is greatly dependent on the accuracy of the chosen intermediate. A suitable intermediate can be constructed using the sequence-structure relationship information contained in the so-called contingency table - this table expresses the likelihood of encountering various structural motifs for each tetrapeptide fragment in the amino acid sequence. The limited accuracy with which such structures could previously be predicted provided the motivation for a more indepth study of the contingency table itself. The Contingency Table Browser is a tool which can visualize, search and analyze the table. Our work presents possible applications of Contingency Table Browser, among them - analysis of specific protein sequences from the point of view of their structural ambiguity.

  2. Cascaded bidirectional recurrent neural networks for protein secondary structure prediction.

    Science.gov (United States)

    Chen, Jinmiao; Chaudhari, Narendra

    2007-01-01

    Protein secondary structure (PSS) prediction is an important topic in bioinformatics. Our study on a large set of non-homologous proteins shows that long-range interactions commonly exist and negatively affect PSS prediction. Besides, we also reveal strong correlations between secondary structure (SS) elements. In order to take into account the long-range interactions and SS-SS correlations, we propose a novel prediction system based on cascaded bidirectional recurrent neural network (BRNN). We compare the cascaded BRNN against another two BRNN architectures, namely the original BRNN architecture used for speech recognition as well as Pollastri's BRNN that was proposed for PSS prediction. Our cascaded BRNN achieves an overall three state accuracy Q3 of 74.38\\%, and reaches a high Segment OVerlap (SOV) of 66.0455. It outperforms the original BRNN and Pollastri's BRNN in both Q3 and SOV. Specifically, it improves the SOV score by 4-6%.

  3. Three-dimensional protein structure prediction: Methods and computational strategies.

    Science.gov (United States)

    Dorn, Márcio; E Silva, Mariel Barbachan; Buriol, Luciana S; Lamb, Luis C

    2014-10-12

    A long standing problem in structural bioinformatics is to determine the three-dimensional (3-D) structure of a protein when only a sequence of amino acid residues is given. Many computational methodologies and algorithms have been proposed as a solution to the 3-D Protein Structure Prediction (3-D-PSP) problem. These methods can be divided in four main classes: (a) first principle methods without database information; (b) first principle methods with database information; (c) fold recognition and threading methods; and (d) comparative modeling methods and sequence alignment strategies. Deterministic computational techniques, optimization techniques, data mining and machine learning approaches are typically used in the construction of computational solutions for the PSP problem. Our main goal with this work is to review the methods and computational strategies that are currently used in 3-D protein prediction. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. (PS)2: protein structure prediction server version 3.0.

    Science.gov (United States)

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    Science.gov (United States)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  6. Constraint Logic Programming approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Fogolari Federico

    2004-11-01

    Full Text Available Abstract Background The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Results Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. Conclusions The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  7. Constraint Logic Programming approach to protein structure prediction.

    Science.gov (United States)

    Dal Palù, Alessandro; Dovier, Agostino; Fogolari, Federico

    2004-11-30

    The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known) secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  8. Distance matrix-based approach to protein structure prediction.

    Science.gov (United States)

    Kloczkowski, Andrzej; Jernigan, Robert L; Wu, Zhijun; Song, Guang; Yang, Lei; Kolinski, Andrzej; Pokarowski, Piotr

    2009-03-01

    Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the

  9. Improved hybrid optimization algorithm for 3D protein structure prediction.

    Science.gov (United States)

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.

  10. Parallel protein secondary structure prediction based on neural networks.

    Science.gov (United States)

    Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

    2004-01-01

    Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.

  11. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  12. PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    Directory of Open Access Journals (Sweden)

    Aboul-Magd Mohammed O

    2009-07-01

    Full Text Available Abstract Background Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures from primary sequence data which makes use of Parallel Cascade Identification (PCI, a powerful technique from the field of nonlinear system identification. Results Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at http://bioinf.sce.carleton.ca/PCISS. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input

  13. A probabilistic fragment-based protein structure prediction algorithm.

    Directory of Open Access Journals (Sweden)

    David Simoncini

    Full Text Available Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of [Formula: see text] proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold's decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software.html [corrected].

  14. Protein 8-class secondary structure prediction using conditional neural fields.

    Science.gov (United States)

    Wang, Zhiyong; Zhao, Feng; Peng, Jian; Xu, Jinbo

    2011-10-01

    Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Predicting protein structures with a multiplayer online game.

    Science.gov (United States)

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit

    2010-08-05

    People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.

  16. I-TASSER server for protein 3D structure prediction

    Directory of Open Access Journals (Sweden)

    Zhang Yang

    2008-01-01

    Full Text Available Abstract Background Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions. Results An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1] of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 Å for RMSD. Conclusion The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available

  17. Optimal neural networks for protein-structure prediction

    International Nuclear Information System (INIS)

    Head-Gordon, T.; Stillinger, F.H.

    1993-01-01

    The successful application of neural-network algorithms for prediction of protein structure is stymied by three problem areas: the sparsity of the database of known protein structures, poorly devised network architectures which make the input-output mapping opaque, and a global optimization problem in the multiple-minima space of the network variables. We present a simplified polypeptide model residing in two dimensions with only two amino-acid types, A and B, which allows the determination of the global energy structure for all possible sequences of pentamer, hexamer, and heptamer lengths. This model simplicity allows us to compile a complete structural database and to devise neural networks that reproduce the tertiary structure of all sequences with absolute accuracy and with the smallest number of network variables. These optimal networks reveal that the three problem areas are convoluted, but that thoughtful network designs can actually deconvolute these detrimental traits to provide network algorithms that genuinely impact on the ability of the network to generalize or learn the desired mappings. Furthermore, the two-dimensional polypeptide model shows sufficient chemical complexity so that transfer of neural-network technology to more realistic three-dimensional proteins is evident

  18. Prediction of protein–protein interactions: unifying evolution and structure at protein interfaces

    International Nuclear Information System (INIS)

    Tuncbag, Nurcan; Gursoy, Attila; Keskin, Ozlem

    2011-01-01

    The vast majority of the chores in the living cell involve protein–protein interactions. Providing details of protein interactions at the residue level and incorporating them into protein interaction networks are crucial toward the elucidation of a dynamic picture of cells. Despite the rapid increase in the number of structurally known protein complexes, we are still far away from a complete network. Given experimental limitations, computational modeling of protein interactions is a prerequisite to proceed on the way to complete structural networks. In this work, we focus on the question 'how do proteins interact?' rather than 'which proteins interact?' and we review structure-based protein–protein interaction prediction approaches. As a sample approach for modeling protein interactions, PRISM is detailed which combines structural similarity and evolutionary conservation in protein interfaces to infer structures of complexes in the protein interaction network. This will ultimately help us to understand the role of protein interfaces in predicting bound conformations

  19. Protein Loop Structure Prediction Using Conformational Space Annealing.

    Science.gov (United States)

    Heo, Seungryong; Lee, Juyong; Joo, Keehyoung; Shin, Hang-Cheol; Lee, Jooyoung

    2017-05-22

    We have developed a protein loop structure prediction method by combining a new energy function, which we call E PLM (energy for protein loop modeling), with the conformational space annealing (CSA) global optimization algorithm. The energy function includes stereochemistry, dynamic fragment assembly, distance-scaled finite ideal gas reference (DFIRE), and generalized orientation- and distance-dependent terms. For the conformational search of loop structures, we used the CSA algorithm, which has been quite successful in dealing with various hard global optimization problems. We assessed the performance of E PLM with two widely used loop-decoy sets, Jacobson and RAPPER, and compared the results against the DFIRE potential. The accuracy of model selection from a pool of loop decoys as well as de novo loop modeling starting from randomly generated structures was examined separately. For the selection of a nativelike structure from a decoy set, E PLM was more accurate than DFIRE in the case of the Jacobson set and had similar accuracy in the case of the RAPPER set. In terms of sampling more nativelike loop structures, E PLM outperformed E DFIRE for both decoy sets. This new approach equipped with E PLM and CSA can serve as the state-of-the-art de novo loop modeling method.

  20. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

    Directory of Open Access Journals (Sweden)

    Mile Sikić

    2009-01-01

    Full Text Available Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i a combination of sequence- and structure-derived parameters and (ii sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.

  1. Improving the accuracy of protein secondary structure prediction using structural alignment

    Directory of Open Access Journals (Sweden)

    Gallin Warren J

    2006-06-01

    Full Text Available Abstract Background The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3 of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences, the probability of a newly identified sequence having a structural homologue is actually quite high. Results We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25% onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics indicate that this new method can achieve a Q3 score approaching 88%. Conclusion By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs

  2. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-01-01

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment

  3. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  4. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites.

    Science.gov (United States)

    Jelínek, Jan; Škoda, Petr; Hoksza, David

    2017-12-06

    Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.

  5. Validation of Molecular Dynamics Simulations for Prediction of Three-Dimensional Structures of Small Proteins.

    Science.gov (United States)

    Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi

    2017-10-12

    Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.

  6. Structural and Function Prediction of Musa acuminata subsp. Malaccensis Protein

    Directory of Open Access Journals (Sweden)

    Anum Munir

    2016-03-01

    Full Text Available Hypothetical proteins (HPs are the proteins whose presence has been anticipated, yet in vivo function has not been built up. Illustrating the structural and functional privileged insights of these HPs might likewise prompt a superior comprehension of the protein-protein associations or networks in diverse types of life. Bananas (Musa acuminata spp., including sweet and cooking types, are giant perennial monocotyledonous herbs of the order Zingiberales, a sister grouped to the all-around considered Poales, which incorporate oats. Bananas are crucial for nourishment security in numerous tropical and subtropical nations and the most prominent organic product in industrialized nations. In the present study, the hypothetical protein of M. acuminata (Banana was chosen for analysis and modeling by distinctive bioinformatics apparatuses and databases. As indicated by primary and secondary structure analysis, XP_009393594.1 is a stable hydrophobic protein containing a noteworthy extent of α-helices; Homology modeling was done utilizing SWISS-MODEL server where the templates identity with XP_009393594.1 protein was less which demonstrated novelty of our protein. Ab initio strategy was conducted to produce its 3D structure. A few evaluations of quality assessment and validation parameters determined the generated protein model as stable with genuinely great quality. Functional analysis was completed by ProtFun 2.2, and KEGG (KAAS, recommended that the hypothetical protein is a transcription factor with cytoplasmic domain as zinc finger. The protein was observed to be vital for translation process, involved in metabolism, signaling and cellular processes, genetic information processing and Zinc ion binding. It is suggested that further test approval would help to anticipate the structures and functions of other uncharacterized proteins of different plants and living being.

  7. Update on protein structure prediction: results of the 1995 IRBM workshop

    DEFF Research Database (Denmark)

    Hubbard, Tim; Tramontano, Anna; Hansen, Jan

    1996-01-01

    Computational tools for protein structure prediction are of great interest to molecular, structural and theoretical biologists due to a rapidly increasing number of protein sequences with no known structure. In October 1995, a workshop was held at IRBM to predict as much as possible about a numbe...

  8. Update on protein structure prediction: results of the 1995 IRBM workshop

    DEFF Research Database (Denmark)

    Hubbard, Tim; Tramontano, Anna; Hansen, Jan

    1996-01-01

    Computational tools for protein structure prediction are of great interest to molecular, structural and theoretical biologists due to a rapidly increasing number of protein sequences with no known structure. In October 1995, a workshop was held at IRBM to predict as much as possible about a number...

  9. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng

    2016-06-15

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

  10. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

    Directory of Open Access Journals (Sweden)

    Huiying Zhao

    Full Text Available As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions. A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC of 0.77 with high precision (94% and high sensitivity (65%. We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA] is available as an on-line server at http://sparks-lab.org.

  11. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    Science.gov (United States)

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  12. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Chira Camelia

    2011-07-01

    Full Text Available Abstract Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  13. Ranking beta sheet topologies with applications to protein structure prediction

    DEFF Research Database (Denmark)

    Fonseca, Rasmus; Helles, Glennie; Winter, Pawel

    2011-01-01

    One reason why ab initio protein structure predictors do not perform very well is their inability to reliably identify long-range interactions between amino acids. To achieve reliable long-range interactions, all potential pairings of ß-strands (ß-topologies) of a given protein are enumerated......, including the native ß-topology. Two very different ß-topology scoring methods from the literature are then used to rank all potential ß-topologies. This has not previously been attempted for any scoring method. The main result of this paper is a justification that one of the scoring methods, in particular......, consistently top-ranks native ß-topologies. Since the number of potential ß-topologies grows exponentially with the number of ß-strands, it is unrealistic to expect that all potential ß-topologies can be enumerated for large proteins. The second result of this paper is an enumeration scheme of a subset of ß-topologies...

  14. Fast computational methods for predicting protein structure from primary amino acid sequence

    Science.gov (United States)

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  15. Prediction of protein-protein interactions in dengue virus coat proteins guided by low resolution cryoEM structures

    Directory of Open Access Journals (Sweden)

    Srinivasan Narayanaswamy

    2010-06-01

    Full Text Available Abstract Background Dengue virus along with the other members of the flaviviridae family has reemerged as deadly human pathogens. Understanding the mechanistic details of these infections can be highly rewarding in developing effective antivirals. During maturation of the virus inside the host cell, the coat proteins E and M undergo conformational changes, altering the morphology of the viral coat. However, due to low resolution nature of the available 3-D structures of viral assemblies, the atomic details of these changes are still elusive. Results In the present analysis, starting from Cα positions of low resolution cryo electron microscopic structures the residue level details of protein-protein interaction interfaces of dengue virus coat proteins have been predicted. By comparing the preexisting structures of virus in different phases of life cycle, the changes taking place in these predicted protein-protein interaction interfaces were followed as a function of maturation process of the virus. Besides changing the current notion about the presence of only homodimers in the mature viral coat, the present analysis indicated presence of a proline-rich motif at the protein-protein interaction interface of the coat protein. Investigating the conservation status of these seemingly functionally crucial residues across other members of flaviviridae family enabled dissecting common mechanisms used for infections by these viruses. Conclusions Thus, using computational approach the present analysis has provided better insights into the preexisting low resolution structures of virus assemblies, the findings of which can be made use of in designing effective antivirals against these deadly human pathogens.

  16. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

    Science.gov (United States)

    Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

    2018-03-10

    Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.

  17. Predicting protein folding pathways at the mesoscopic level based on native interactions between secondary structure elements

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2008-07-01

    Full Text Available Abstract Background Since experimental determination of protein folding pathways remains difficult, computational techniques are often used to simulate protein folding. Most current techniques to predict protein folding pathways are computationally intensive and are suitable only for small proteins. Results By assuming that the native structure of a protein is known and representing each intermediate conformation as a collection of fully folded structures in which each of them contains a set of interacting secondary structure elements, we show that it is possible to significantly reduce the conformation space while still being able to predict the most energetically favorable folding pathway of large proteins with hundreds of residues at the mesoscopic level, including the pig muscle phosphoglycerate kinase with 416 residues. The model is detailed enough to distinguish between different folding pathways of structurally very similar proteins, including the streptococcal protein G and the peptostreptococcal protein L. The model is also able to recognize the differences between the folding pathways of protein G and its two structurally similar variants NuG1 and NuG2, which are even harder to distinguish. We show that this strategy can produce accurate predictions on many other proteins with experimentally determined intermediate folding states. Conclusion Our technique is efficient enough to predict folding pathways for both large and small proteins at the mesoscopic level. Such a strategy is often the only feasible choice for large proteins. A software program implementing this strategy (SSFold is available at http://faculty.cs.tamu.edu/shsze/ssfold.

  18. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    Science.gov (United States)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  19. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    Science.gov (United States)

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on

  20. Building a Better Fragment Library for De Novo Protein Structure Prediction

    Science.gov (United States)

    de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

    2015-01-01

    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595

  1. Building a better fragment library for de novo protein structure prediction.

    Directory of Open Access Journals (Sweden)

    Saulo H P de Oliveira

    Full Text Available Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10. We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. "Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources".

  2. G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.

    Science.gov (United States)

    Lee, Hui Sun; Im, Wonpil

    2017-01-01

    Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.

  3. Protein structure predictions with Monte Carlo simulated annealing: Case for the β-sheet

    Science.gov (United States)

    Okamoto, Y.; Fukugita, M.; Kawai, H.; Nakazawa, T.

    Work is continued for a prediction of three-dimensional structure of peptides and proteins with Monte Carlo simulated annealing using only a generic energy function and amino acid sequence as input. We report that β-sheet like structure is successfully predicted for a fragment of bovine pancreatic trypsin inhibitor which is known to have the β-sheet structure in nature. Together with the results for α-helix structure reported earlier, this means that a successful prediction can be made, at least at a qualitative level, for two dominant building blocks of proteins, α-helix and β-sheet, from the information of amino acid sequence alone.

  4. Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods?

    Science.gov (United States)

    Skolnick, Jeffrey; Zhou, Hongyi

    2017-04-20

    Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.

  5. Structural features that predict real-value fluctuations of globular proteins.

    Science.gov (United States)

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  6. A computational tool to predict the evolutionarily conserved protein-protein interaction hot-spot residues from the structure of the unbound protein.

    Science.gov (United States)

    Agrawal, Neeraj J; Helk, Bernhard; Trout, Bernhardt L

    2014-01-21

    Identifying hot-spot residues - residues that are critical to protein-protein binding - can help to elucidate a protein's function and assist in designing therapeutic molecules to target those residues. We present a novel computational tool, termed spatial-interaction-map (SIM), to predict the hot-spot residues of an evolutionarily conserved protein-protein interaction from the structure of an unbound protein alone. SIM can predict the protein hot-spot residues with an accuracy of 36-57%. Thus, the SIM tool can be used to predict the yet unknown hot-spot residues for many proteins for which the structure of the protein-protein complexes are not available, thereby providing a clue to their functions and an opportunity to design therapeutic molecules to target these proteins. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  7. Protein Tertiary Structure Prediction Based on Main Chain Angle Using a Hybrid Bees Colony Optimization Algorithm

    Science.gov (United States)

    Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen

    Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.

  8. A Multiobjective Approach Applied to the Protein Structure Prediction Problem

    Science.gov (United States)

    2002-03-07

    local conformations [38]. Moreover, all these models have the same theme in trying to define the properties a real protein has when folding. Today , it...attempted to solve the PSP problem with a real valued GA and found better results than a competitor (Scheraga, et al) [50]; however, today we know that...ACM Symposium on Applied computing (SAC01) (March 11-14 2001). Las Vegas, Nevada. [22] Derrida , B. “Random Energy Model: Limit of a Family of

  9. ProteinSplit: splitting of multi-domain proteins using prediction of ordered and disordered regions in protein sequences for virtual structural genomics

    International Nuclear Information System (INIS)

    Wyrwicz, Lucjan S; Koczyk, Grzegorz; Rychlewski, Leszek; Plewczynski, Dariusz

    2007-01-01

    The annotation of protein folds within newly sequenced genomes is the main target for semi-automated protein structure prediction (virtual structural genomics). A large number of automated methods have been developed recently with very good results in the case of single-domain proteins. Unfortunately, most of these automated methods often fail to properly predict the distant homology between a given multi-domain protein query and structural templates. Therefore a multi-domain protein should be split into domains in order to overcome this limitation. ProteinSplit is designed to identify protein domain boundaries using a novel algorithm that predicts disordered regions in protein sequences. The software utilizes various sequence characteristics to assess the local propensity of a protein to be disordered or ordered in terms of local structure stability. These disordered parts of a protein are likely to create interdomain spacers. Because of its speed and portability, the method was successfully applied to several genome-wide fold annotation experiments. The user can run an automated analysis of sets of proteins or perform semi-automated multiple user projects (saving the results on the server). Additionally the sequences of predicted domains can be sent to the Bioinfo.PL Protein Structure Prediction Meta-Server for further protein three-dimensional structure and function prediction. The program is freely accessible as a web service at http://lucjan.bioinfo.pl/proteinsplit together with detailed benchmark results on the critical assessment of a fully automated structure prediction (CAFASP) set of sequences. The source code of the local version of protein domain boundary prediction is available upon request from the authors

  10. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    Science.gov (United States)

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-06-20

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

  11. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    Science.gov (United States)

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  12. Electrostatics, structure prediction, and the energy landscapes for protein folding and binding.

    Science.gov (United States)

    Tsai, Min-Yeh; Zheng, Weihua; Balamurugan, D; Schafer, Nicholas P; Kim, Bobby L; Cheung, Margaret S; Wolynes, Peter G

    2016-01-01

    While being long in range and therefore weakly specific, electrostatic interactions are able to modulate the stability and folding landscapes of some proteins. The relevance of electrostatic forces for steering the docking of proteins to each other is widely acknowledged, however, the role of electrostatics in establishing specifically funneled landscapes and their relevance for protein structure prediction are still not clear. By introducing Debye-Hückel potentials that mimic long-range electrostatic forces into the Associative memory, Water mediated, Structure, and Energy Model (AWSEM), a transferable protein model capable of predicting tertiary structures, we assess the effects of electrostatics on the landscapes of thirteen monomeric proteins and four dimers. For the monomers, we find that adding electrostatic interactions does not improve structure prediction. Simulations of ribosomal protein S6 show, however, that folding stability depends monotonically on electrostatic strength. The trend in predicted melting temperatures of the S6 variants agrees with experimental observations. Electrostatic effects can play a range of roles in binding. The binding of the protein complex KIX-pKID is largely assisted by electrostatic interactions, which provide direct charge-charge stabilization of the native state and contribute to the funneling of the binding landscape. In contrast, for several other proteins, including the DNA-binding protein FIS, electrostatics causes frustration in the DNA-binding region, which favors its binding with DNA but not with its protein partner. This study highlights the importance of long-range electrostatics in functional responses to problems where proteins interact with their charged partners, such as DNA, RNA, as well as membranes. © 2015 The Protein Society.

  13. The calcium binding properties and structure prediction of the Hax-1 protein.

    Science.gov (United States)

    Balcerak, Anna; Rowinski, Sebastian; Szafron, Lukasz M; Grzybowska, Ewa A

    2017-01-01

    Hax-1 is a protein involved in regulation of different cellular processes, but its properties and exact mechanisms of action remain unknown. In this work, using purified, recombinant Hax-1 and by applying an in vitro autoradiography assay we have shown that this protein binds Ca 2+ . Additionally, we performed structure prediction analysis which shows that Hax-1 displays definitive structural features, such as two α-helices, short β-strands and four disordered segments.

  14. Reduced Fragment Diversity for Alpha and Alpha-Beta Protein Structure Prediction using Rosetta.

    Science.gov (United States)

    Abbass, Jad; Nebel, Jean-Christophe

    2017-01-01

    Protein structure prediction is considered a main challenge in computational biology. The biannual international competition, Critical Assessment of protein Structure Prediction (CASP), has shown in its eleventh experiment that free modelling target predictions are still beyond reliable accuracy, therefore, much effort should be made to improve ab initio methods. Arguably, Rosetta is considered as the most competitive method when it comes to targets with no homologues. Relying on fragments of length 9 and 3 from known structures, Rosetta creates putative structures by assembling candidate fragments. Generally, the structure with the lowest energy score, also known as first model, is chosen to be the "predicted one". A thorough study has been conducted on the role and diversity of 3-mers involved in Rosetta's model "refinement" phase. Usage of the standard number of 3-mers - i.e. 200 - has been shown to degrade alpha and alpha-beta protein conformations initially achieved by assembling 9-mers. Therefore, a new prediction pipeline is proposed for Rosetta where the "refinement" phase is customised according to a target's structural class prediction. Over 8% improvement in terms of first model structure accuracy is reported for alpha and alpha-beta classes when decreasing the number of 3- mers. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models

    Directory of Open Access Journals (Sweden)

    Borodovsky Mark

    2006-03-01

    Full Text Available Abstract Background The accuracy of protein secondary structure prediction has been improving steadily towards the 88% estimated theoretical limit. There are two types of prediction algorithms: Single-sequence prediction algorithms imply that information about other (homologous proteins is not available, while algorithms of the second type imply that information about homologous proteins is available, and use it intensively. The single-sequence algorithms could make an important contribution to studies of proteins with no detected homologs, however the accuracy of protein secondary structure prediction from a single-sequence is not as high as when the additional evolutionary information is present. Results In this paper, we further refine and extend the hidden semi-Markov model (HSMM initially considered in the BSPSS algorithm. We introduce an improved residue dependency model by considering the patterns of statistically significant amino acid correlation at structural segment borders. We also derive models that specialize on different sections of the dependency structure and incorporate them into HSMM. In addition, we implement an iterative training method to refine estimates of HSMM parameters. The three-state-per-residue accuracy and other accuracy measures of the new method, IPSSP, are shown to be comparable or better than ones for BSPSS as well as for PSIPRED, tested under the single-sequence condition. Conclusions We have shown that new dependency models and training methods bring further improvements to single-sequence protein secondary structure prediction. The results are obtained under cross-validation conditions using a dataset with no pair of sequences having significant sequence similarity. As new sequences are added to the database it is possible to augment the dependency structure and obtain even higher accuracy. Current and future advances should contribute to the improvement of function prediction for orphan proteins inscrutable

  16. De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

    Science.gov (United States)

    Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

    2011-08-01

    Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.

  17. Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids.

    Science.gov (United States)

    Raicar, Gaurav; Saini, Harsh; Dehzangi, Abdollah; Lal, Sunil; Sharma, Alok

    2016-08-07

    Predicting the three-dimensional (3-D) structure of a protein is an important task in the field of bioinformatics and biological sciences. However, directly predicting the 3-D structure from the primary structure is hard to achieve. Therefore, predicting the fold or structural class of a protein sequence is generally used as an intermediate step in determining the protein's 3-D structure. For protein fold recognition (PFR) and structural class prediction (SCP), two steps are required - feature extraction step and classification step. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In this study, we explore the importance of utilizing the physicochemical properties of amino acids for improving PFR and SCP accuracies. For this, we propose a Forward Consecutive Search (FCS) scheme which aims to strategically select physicochemical attributes that will supplement the existing feature extraction techniques for PFR and SCP. An exhaustive search is conducted on all the existing 544 physicochemical attributes using the proposed FCS scheme and a subset of physicochemical attributes is identified. Features extracted from these selected attributes are then combined with existing syntactical-based and evolutionary-based features, to show an improvement in the recognition and prediction performance on benchmark datasets. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Evaluation of multiple protein docking structures using correctly predicted pairwise subunits

    Directory of Open Access Journals (Sweden)

    Esquivel-Rodríguez Juan

    2012-03-01

    Full Text Available Abstract Background Many functionally important proteins in a cell form complexes with multiple chains. Therefore, computational prediction of multiple protein complexes is an important task in bioinformatics. In the development of multiple protein docking methods, it is important to establish a metric for evaluating prediction results in a reasonable and practical fashion. However, since there are only few works done in developing methods for multiple protein docking, there is no study that investigates how accurate structural models of multiple protein complexes should be to allow scientists to gain biological insights. Methods We generated a series of predicted models (decoys of various accuracies by our multiple protein docking pipeline, Multi-LZerD, for three multi-chain complexes with 3, 4, and 6 chains. We analyzed the decoys in terms of the number of correctly predicted pair conformations in the decoys. Results and conclusion We found that pairs of chains with the correct mutual orientation exist even in the decoys with a large overall root mean square deviation (RMSD to the native. Therefore, in addition to a global structure similarity measure, such as the global RMSD, the quality of models for multiple chain complexes can be better evaluated by using the local measurement, the number of chain pairs with correct mutual orientation. We termed the fraction of correctly predicted pairs (RMSD at the interface of less than 4.0Å as fpair and propose to use it for evaluation of the accuracy of multiple protein docking.

  19. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

    Directory of Open Access Journals (Sweden)

    Chi-Hua Tung

    2016-01-01

    Full Text Available Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.

  20. Prediction of backbone dihedral angles and protein secondary structure using support vector machines

    Directory of Open Access Journals (Sweden)

    Hirst Jonathan D

    2009-12-01

    Full Text Available Abstract Background The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure. Results We predict independently both the secondary structure and the backbone dihedral angles and combine the results in a loop to enhance each prediction reciprocally. Support vector machines, a state-of-the-art supervised classification technique, achieve secondary structure predictive accuracy of 80% on a non-redundant set of 513 proteins, significantly higher than other methods on the same dataset. The dihedral angle space is divided into a number of regions using two unsupervised clustering techniques in order to predict the region in which a new residue belongs. The performance of our method is comparable to, and in some cases more accurate than, other multi-class dihedral prediction methods. Conclusions We have created an accurate predictor of backbone dihedral angles and secondary structure. Our method, called DISSPred, is available online at http://comp.chem.nottingham.ac.uk/disspred/.

  1. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation.

    Science.gov (United States)

    Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng

    2009-04-21

    In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.

  2. Protein docking prediction using predicted protein-protein interface

    Directory of Open Access Journals (Sweden)

    Li Bin

    2012-01-01

    Full Text Available Abstract Background Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. Results We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm, is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. Conclusion We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.

  3. Protein docking prediction using predicted protein-protein interface.

    Science.gov (United States)

    Li, Bin; Kihara, Daisuke

    2012-01-10

    Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.

  4. Predicting adverse drug reaction profiles by integrating protein interaction networks with drug structures.

    Science.gov (United States)

    Huang, Liang-Chin; Wu, Xiaogang; Chen, Jake Y

    2013-01-01

    The prediction of adverse drug reactions (ADRs) has become increasingly important, due to the rising concern on serious ADRs that can cause drugs to fail to reach or stay in the market. We proposed a framework for predicting ADR profiles by integrating protein-protein interaction (PPI) networks with drug structures. We compared ADR prediction performances over 18 ADR categories through four feature groups-only drug targets, drug targets with PPI networks, drug structures, and drug targets with PPI networks plus drug structures. The results showed that the integration of PPI networks and drug structures can significantly improve the ADR prediction performance. The median AUC values for the four groups were 0.59, 0.61, 0.65, and 0.70. We used the protein features in the best two models, "Cardiac disorders" (median-AUC: 0.82) and "Psychiatric disorders" (median-AUC: 0.76), to build ADR-specific PPI networks with literature supports. For validation, we examined 30 drugs withdrawn from the U.S. market to see if our approach can predict their ADR profiles and explain why they were withdrawn. Except for three drugs having ADRs in the categories we did not predict, 25 out of 27 withdrawn drugs (92.6%) having severe ADRs were successfully predicted by our approach. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Artificial Intelligence in Prediction of Secondary Protein Structure Using CB513 Database

    Science.gov (United States)

    Avdagic, Zikrija; Purisevic, Elvir; Omanovic, Samir; Coralic, Zlatan

    2009-01-01

    In this paper we describe CB513 a non-redundant dataset, suitable for development of algorithms for prediction of secondary protein structure. A program was made in Borland Delphi for transforming data from our dataset to make it suitable for learning of neural network for prediction of secondary protein structure implemented in MATLAB Neural-Network Toolbox. Learning (training and testing) of neural network is researched with different sizes of windows, different number of neurons in the hidden layer and different number of training epochs, while using dataset CB513. PMID:21347158

  6. Knowledge base and neural network approach for protein secondary structure prediction.

    Science.gov (United States)

    Patel, Maulika S; Mazumdar, Himanshu S

    2014-11-21

    Protein structure prediction is of great relevance given the abundant genomic and proteomic data generated by the genome sequencing projects. Protein secondary structure prediction is addressed as a sub task in determining the protein tertiary structure and function. In this paper, a novel algorithm, KB-PROSSP-NN, which is a combination of knowledge base and modeling of the exceptions in the knowledge base using neural networks for protein secondary structure prediction (PSSP), is proposed. The knowledge base is derived from a proteomic sequence-structure database and consists of the statistics of association between the 5-residue words and corresponding secondary structure. The predicted results obtained using knowledge base are refined with a Backpropogation neural network algorithm. Neural net models the exceptions of the knowledge base. The Q3 accuracy of 90% and 82% is achieved on the RS126 and CB396 test sets respectively which suggest improvement over existing state of art methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Critical assessment of methods of protein structure prediction (CASP)-round IX

    KAUST Repository

    Moult, John; Fidelis, Krzysztof; Kryshtafovych, Andriy; Tramontano, Anna

    2011-01-01

    This article is an introduction to the special issue of the journal PROTEINS, dedicated to the ninth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Methods for modeling protein structure continue to advance, although at a more modest pace than in the early CASP experiments. CASP developments of note are indications of improvement in model accuracy for some classes of target, an improved ability to choose the most accurate of a set of generated models, and evidence of improvement in accuracy for short "new fold" models. In addition, a new analysis of regions of models not derivable from the most obvious template structure has revealed better performance than expected.

  8. MemBrain: An Easy-to-Use Online Webserver for Transmembrane Protein Structure Prediction

    Science.gov (United States)

    Yin, Xi; Yang, Jing; Xiao, Feng; Yang, Yang; Shen, Hong-Bin

    2018-03-01

    Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels, transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments, accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called MemBrain, whose input is the amino acid sequence. MemBrain consists of specialized modules for predicting transmembrane helices, residue-residue contacts and relative accessible surface area of α-helical membrane proteins. MemBrain achieves a prediction accuracy of 97.9% of A TMH, 87.1% of A P, 3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. MemBrain-Contact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction, respectively. And MemBrain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of 13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins. MemBrain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/. [Figure not available: see fulltext.

  9. 3dRPC: a web server for 3D RNA-protein structure prediction.

    Science.gov (United States)

    Huang, Yangyu; Li, Haotian; Xiao, Yi

    2018-04-01

    RNA-protein interactions occur in many biological processes. To understand the mechanism of these interactions one needs to know three-dimensional (3D) structures of RNA-protein complexes. 3dRPC is an algorithm for prediction of 3D RNA-protein complex structures and consists of a docking algorithm RPDOCK and a scoring function 3dRPC-Score. RPDOCK is used to sample possible complex conformations of an RNA and a protein by calculating the geometric and electrostatic complementarities and stacking interactions at the RNA-protein interface according to the features of atom packing of the interface. 3dRPC-Score is a knowledge-based potential that uses the conformations of nucleotide-amino-acid pairs as statistical variables and that is used to choose the near-native complex-conformations obtained from the docking method above. Recently, we built a web server for 3dRPC. The users can easily use 3dRPC without installing it locally. RNA and protein structures in PDB (Protein Data Bank) format are the only needed input files. It can also incorporate the information of interface residues or residue-pairs obtained from experiments or theoretical predictions to improve the prediction. The address of 3dRPC web server is http://biophy.hust.edu.cn/3dRPC. yxiao@hust.edu.cn.

  10. LoopIng: a template-based tool for predicting the structure of protein loops.

    KAUST Repository

    Messih, Mario Abdel

    2015-08-06

    Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function.We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4-10 residues) and significant enhancements for long loops (11-20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop).www.biocomputing.it/loopinganna.tramontano@uniroma1.itSupplementary data are available at Bioinformatics online.

  11. Exploiting the Past and the Future in Protein Secondary Structure Prediction

    DEFF Research Database (Denmark)

    Baldi, Pierre; Brunak, Søren; Frasconi, P

    1999-01-01

    predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary......Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network...

  12. Predicting binding within disordered protein regions to structurally characterised peptide-binding domains.

    Directory of Open Access Journals (Sweden)

    Waqasuddin Khan

    Full Text Available Disordered regions of proteins often bind to structured domains, mediating interactions within and between proteins. However, it is difficult to identify a priori the short disordered regions involved in binding. We set out to determine if docking such peptide regions to peptide binding domains would assist in these predictions.We assembled a redundancy reduced dataset of SLiM (Short Linear Motif containing proteins from the ELM database. We selected 84 sequences which had an associated PDB structures showing the SLiM bound to a protein receptor, where the SLiM was found within a 50 residue region of the protein sequence which was predicted to be disordered. First, we investigated the Vina docking scores of overlapping tripeptides from the 50 residue SLiM containing disordered regions of the protein sequence to the corresponding PDB domain. We found only weak discrimination of docking scores between peptides involved in binding and adjacent non-binding peptides in this context (AUC 0.58.Next, we trained a bidirectional recurrent neural network (BRNN using as input the protein sequence, predicted secondary structure, Vina docking score and predicted disorder score. The results were very promising (AUC 0.72 showing that multiple sources of information can be combined to produce results which are clearly superior to any single source.We conclude that the Vina docking score alone has only modest power to define the location of a peptide within a larger protein region known to contain it. However, combining this information with other knowledge (using machine learning methods clearly improves the identification of peptide binding regions within a protein sequence. This approach combining docking with machine learning is primarily a predictor of binding to peptide-binding sites, and is not intended as a predictor of specificity of binding to particular receptors.

  13. Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools.

    Directory of Open Access Journals (Sweden)

    Lei Jia

    Full Text Available Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find "hot spots" in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html is a public database that consists of thousands of protein mutants' experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG and melting temperature change (dTm were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models.

  14. Prediction of protein structural features by use of artificial neural networks

    DEFF Research Database (Denmark)

    Petersen, Bent

    . There is a huge over-representation of DNA sequences when comparing the amount of experimentally verified proteins with the amount of DNA sequences. The academic and industrial research community therefore has to rely on structure predictions instead of waiting for the time consuming experimentally determined...

  15. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    Science.gov (United States)

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  16. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction.

    Science.gov (United States)

    de Oliveira, Saulo H P; Law, Eleanor C; Shi, Jiye; Deane, Charlotte M

    2018-04-01

    Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. saulo.deoliveira@dtc.ox.ac.uk. Supplementary data are available at Bioinformatics online.

  17. Structural similarity-based predictions of protein interactions between HIV-1 and Homo sapiens

    Directory of Open Access Journals (Sweden)

    Gomez Shawn M

    2010-04-01

    Full Text Available Abstract Background In the course of infection, viruses such as HIV-1 must enter a cell, travel to sites where they can hijack host machinery to transcribe their genes and translate their proteins, assemble, and then leave the cell again, all while evading the host immune system. Thus, successful infection depends on the pathogen's ability to manipulate the biological pathways and processes of the organism it infects. Interactions between HIV-encoded and human proteins provide one means by which HIV-1 can connect into cellular pathways to carry out these survival processes. Results We developed and applied a computational approach to predict interactions between HIV and human proteins based on structural similarity of 9 HIV-1 proteins to human proteins having known interactions. Using functional data from RNAi studies as a filter, we generated over 2000 interaction predictions between HIV proteins and 406 unique human proteins. Additional filtering based on Gene Ontology cellular component annotation reduced the number of predictions to 502 interactions involving 137 human proteins. We find numerous known interactions as well as novel interactions showing significant functional relevance based on supporting Gene Ontology and literature evidence. Conclusions Understanding the interplay between HIV-1 and its human host will help in understanding the viral lifecycle and the ways in which this virus is able to manipulate its host. The results shown here provide a potential set of interactions that are amenable to further experimental manipulation as well as potential targets for therapeutic intervention.

  18. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction.

    Science.gov (United States)

    Spencer, Matt; Eickholt, Jesse; Jianlin Cheng

    2015-01-01

    Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q3 accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.

  19. Structure Prediction of Outer Membrane Protease Protein of Salmonella typhimurium Using Computational Techniques

    Directory of Open Access Journals (Sweden)

    Rozina Tabassum

    2016-03-01

    Full Text Available Salmonella typhimurium, a facultative gram-negative intracellular pathogen belonging to family Enterobacteriaceae, is the most frequent cause of human gastroenteritis worldwide. PgtE gene product, outer membrane protease emerges important in the intracellular phases of salmonellosis. The pgtE gene product of S. typhimurium was predicted to be capable of proteolyzing T7 RNA polymerase and localize in the outer membrane of these gram negative bacteria. PgtE product of S. enterica and OmpT of E. coli, having high sequence similarity have been revealed to degrade macrophages, causing salmonellosis and other diseases. The three-dimensional structure of the protein was not available through Protein Data Bank (PDB creating lack of structural information about E protein. In our study, by performing Comparative model building, the three dimensional structure of outer membrane protease protein was generated using the backbone of the crystal structure of Pla of Yersinia pestis, retrieved from PDB, with MODELLER (9v8. Quality of the model was assessed by validation tool PROCHECK, web servers like ERRAT and ProSA are used to certify the reliability of the predicted model. This information might offer clues for better understanding of E protein and consequently for developmet of better therapeutic treatment against pathogenic role of this protein in salmonellosis and other diseases.

  20. CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway.

    Science.gov (United States)

    Zhou, Jiyun; Wang, Hongpeng; Zhao, Zhishan; Xu, Ruifeng; Lu, Qin

    2018-05-08

    Protein secondary structure is the three dimensional form of local segments of proteins and its prediction is an important problem in protein tertiary structure prediction. Developing computational approaches for protein secondary structure prediction is becoming increasingly urgent. We present a novel deep learning based model, referred to as CNNH_PSS, by using multi-scale CNN with highway. In CNNH_PSS, any two neighbor convolutional layers have a highway to deliver information from current layer to the output of the next one to keep local contexts. As lower layers extract local context while higher layers extract long-range interdependencies, the highways between neighbor layers allow CNNH_PSS to have ability to extract both local contexts and long-range interdependencies. We evaluate CNNH_PSS on two commonly used datasets: CB6133 and CB513. CNNH_PSS outperforms the multi-scale CNN without highway by at least 0.010 Q8 accuracy and also performs better than CNF, DeepCNF and SSpro8, which cannot extract long-range interdependencies, by at least 0.020 Q8 accuracy, demonstrating that both local contexts and long-range interdependencies are indeed useful for prediction. Furthermore, CNNH_PSS also performs better than GSM and DCRNN which need extra complex model to extract long-range interdependencies. It demonstrates that CNNH_PSS not only cost less computer resource, but also achieves better predicting performance. CNNH_PSS have ability to extracts both local contexts and long-range interdependencies by combing multi-scale CNN and highway network. The evaluations on common datasets and comparisons with state-of-the-art methods indicate that CNNH_PSS is an useful and efficient tool for protein secondary structure prediction.

  1. Tchebichef image moment approach to the prediction of protein secondary structures based on circular dichroism.

    Science.gov (United States)

    Li, Sha Sha; Li, Bao Qiong; Liu, Jin Jin; Lu, Shao Hua; Zhai, Hong Lin

    2018-04-20

    Circular dichroism (CD) spectroscopy is a widely used technique for the evaluation of protein secondary structures that has a significant impact for the understanding of molecular biology. However, the quantitative analysis of protein secondary structures based on CD spectra is still a hard work due to the serious overlap of the spectra corresponding to different structural motifs. Here, Tchebichef image moment (TM) approach is introduced for the first time, which can effectively extract the chemical features in CD spectra for the quantitative analysis of protein secondary structures. The proposed approach was applied to analyze reference set. and the obtained results were evaluated by the strict statistical parameters such as correlation coefficient, cross-validation correlation coefficient and root mean squared error. Compared with several specialized prediction methods, TM approach provided satisfactory results, especially for turns and unordered structures. Our study indicates that TM approach can be regarded as a feasible tool for the analysis of the secondary structures of proteins based on CD spectra. An available TMs package is provided and can be used directly for secondary structures prediction. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.

  2. Computational tools for experimental determination and theoretical prediction of protein structure

    Energy Technology Data Exchange (ETDEWEB)

    O`Donoghue, S.; Rost, B.

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. The authors intend to review the state of the art in the experimental determination of protein 3D structure (focus on nuclear magnetic resonance), and in the theoretical prediction of protein function and of protein structure in 1D, 2D and 3D from sequence. All the atomic resolution structures determined so far have been derived from either X-ray crystallography (the majority so far) or Nuclear Magnetic Resonance (NMR) Spectroscopy (becoming increasingly more important). The authors briefly describe the physical methods behind both of these techniques; the major computational methods involved will be covered in some detail. They highlight parallels and differences between the methods, and also the current limitations. Special emphasis will be given to techniques which have application to ab initio structure prediction. Large scale sequencing techniques increase the gap between the number of known proteins sequences and that of known protein structures. They describe the scope and principles of methods that contribute successfully to closing that gap. Emphasis will be given on the specification of adequate testing procedures to validate such methods.

  3. Immobilized metal-affinity chromatography protein-recovery screening is predictive of crystallographic structure success

    International Nuclear Information System (INIS)

    Choi, Ryan; Kelley, Angela; Leibly, David; Nakazawa Hewitt, Stephen; Napuli, Alberto; Van Voorhis, Wesley

    2011-01-01

    An overview of the methods used for high-throughput cloning and protein-expression screening of SSGCID hexahistidine recombinant proteins is provided. It is demonstrated that screening for recombinant proteins that are highly recoverable from immobilized metal-affinity chromatography improves the likelihood that a protein will produce a structure. The recombinant expression of soluble proteins in Escherichia coli continues to be a major bottleneck in structural genomics. The establishment of reliable protocols for the performance of small-scale expression and solubility testing is an essential component of structural genomic pipelines. The SSGCID Protein Production Group at the University of Washington (UW-PPG) has developed a high-throughput screening (HTS) protocol for the measurement of protein recovery from immobilized metal-affinity chromatography (IMAC) which predicts successful purification of hexahistidine-tagged proteins. The protocol is based on manual transfer of samples using multichannel pipettors and 96-well plates and does not depend on the use of robotic platforms. This protocol has been applied to evaluate the expression and solubility of more than 4000 proteins expressed in E. coli. The UW-PPG also screens large-scale preparations for recovery from IMAC prior to purification. Analysis of these results show that our low-cost non-automated approach is a reliable method for the HTS demands typical of large structural genomic projects. This paper provides a detailed description of these protocols and statistical analysis of the SSGCID screening results. The results demonstrate that screening for proteins that yield high recovery after IMAC, both after small-scale and large-scale expression, improves the selection of proteins that can be successfully purified and will yield a crystal structure

  4. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11.

    Science.gov (United States)

    Zhang, Wenxuan; Yang, Jianyi; He, Baoji; Walker, Sara Elizabeth; Zhang, Hongjiu; Govindarajoo, Brandon; Virtanen, Jouko; Xue, Zhidong; Shen, Hong-Bin; Zhang, Yang

    2016-09-01

    We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  5. Critical assessment of methods of protein structure prediction (CASP) - round x

    KAUST Repository

    Moult, John; Fidelis, Krzysztof; Kryshtafovych, Andriy; Schwede, Torsten; Tramontano, Anna

    2013-01-01

    This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.

  6. Critical assessment of methods of protein structure prediction (CASP) - round x

    KAUST Repository

    Moult, John

    2013-12-17

    This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.

  7. The dual role of fragments in fragment-assembly methods for de novo protein structure prediction

    Science.gov (United States)

    Handl, Julia; Knowles, Joshua; Vernon, Robert; Baker, David; Lovell, Simon C.

    2013-01-01

    In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta. PMID:22095594

  8. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    Science.gov (United States)

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-04

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    Science.gov (United States)

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.

  10. Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction

    Directory of Open Access Journals (Sweden)

    Seong-Gon Kim

    2011-06-01

    Full Text Available Several techniques such as Neural Networks, Genetic Algorithms, Decision Trees and other statistical or heuristic methods have been used to approach the complex non-linear task of predicting Alpha-helicies, Beta-sheets and Turns of a proteins secondary structure in the past. This project introduces a new machine learning method by using an offline trained Multilayered Perceptrons (MLP as the likelihood models within a Bayesian Inference framework to predict secondary structures proteins. Varying window sizes are used to extract neighboring amino acid information and passed back and forth between the Neural Net models and the Bayesian Inference process until there is a convergence of the posterior secondary structure probability.

  11. Predicting Structure and Function for Novel Proteins of an Extremophilic Iron Oxidizing Bacterium

    Science.gov (United States)

    Wheeler, K.; Zemla, A.; Banfield, J.; Thelen, M.

    2007-12-01

    Proteins isolated from uncultivated microbial populations represent the functional components of microbial processes and contribute directly to community fitness under natural conditions. Investigations into proteins in the environment are hindered by the lack of genome data, or where available, the high proportion of proteins of unknown function. We have identified thousands of proteins from biofilms in the extremely acidic drainage outflow of an iron mine ecosystem (1). With an extensive genomic and proteomic foundation, we have focused directly on the problem of several hundred proteins of unknown function within this well-defined model system. Here we describe the geobiological insights gained by using a high throughput computational approach for predicting structure and function of 421 novel proteins from the biofilm community. We used a homology based modeling system to compare these proteins to those of known structure (AS2TS) (2). This approach has resulted in the assignment of structures to 360 proteins (85%) and provided functional information for up to 75% of the modeled proteins. Detailed examination of the modeling results enables confident, high-throughput prediction of the roles of many of the novel proteins within the microbial community. For instance, one prediction places a protein in the phosphoenolpyruvate/pyruvate domain superfamily as a carboxylase that fills in a gap in an otherwise complete carbon cycle. Particularly important for a community in such a metal rich environment is the evolution of over 25% of the novel proteins that contain a metal cofactor; of these, one third are likely Fe containing proteins. Two of the most abundant proteins in biofilm samples are unusual c-type cytochromes. Both of these proteins catalyze iron- oxidation, a key metabolic reaction supporting the energy requirements of this community. Structural models of these cytochromes verify our experimental results on heme binding and electron transfer reactivity, and

  12. Large-scale prediction of drug–target interactions using protein sequences and drug topological structures

    International Nuclear Information System (INIS)

    Cao Dongsheng; Liu Shao; Xu Qingsong; Lu Hongmei; Huang Jianhua; Hu Qiannan; Liang Yizeng

    2012-01-01

    Highlights: ► Drug–target interactions are predicted using an extended SAR methodology. ► A drug–target interaction is regarded as an event triggered by many factors. ► Molecular fingerprint and CTD descriptors are used to represent drugs and proteins. ► Our approach shows compatibility between the new scheme and current SAR methodology. - Abstract: The identification of interactions between drugs and target proteins plays a key role in the process of genomic drug discovery. It is both consuming and costly to determine drug–target interactions by experiments alone. Therefore, there is an urgent need to develop new in silico prediction approaches capable of identifying these potential drug–target interactions in a timely manner. In this article, we aim at extending current structure–activity relationship (SAR) methodology to fulfill such requirements. In some sense, a drug–target interaction can be regarded as an event or property triggered by many influence factors from drugs and target proteins. Thus, each interaction pair can be represented theoretically by using these factors which are based on the structural and physicochemical properties simultaneously from drugs and proteins. To realize this, drug molecules are encoded with MACCS substructure fingerings representing existence of certain functional groups or fragments; and proteins are encoded with some biochemical and physicochemical properties. Four classes of drug–target interaction networks in humans involving enzymes, ion channels, G-protein-coupled receptors (GPCRs) and nuclear receptors, are independently used for establishing predictive models with support vector machines (SVMs). The SVM models gave prediction accuracy of 90.31%, 88.91%, 84.68% and 83.74% for four datasets, respectively. In conclusion, the results demonstrate the ability of our proposed method to predict the drug–target interactions, and show a general compatibility between the new scheme and current SAR

  13. Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

    Directory of Open Access Journals (Sweden)

    Taigang Liu

    2015-12-01

    Full Text Available The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM profile has been shown to provide a useful source of information for improving the prediction performance of protein structural class. However, this information has not been adequately explored. To this end, in this study, we present a feature extraction technique which is based on gapped-dipeptides composition computed directly from PSSM. Then, a careful feature selection technique is performed based on support vector machine-recursive feature elimination (SVM-RFE. These optimal features are selected to construct a final predictor. The results of jackknife tests on four working datasets show that our method obtains satisfactory prediction accuracies by extracting features solely based on PSSM and could serve as a very promising tool to predict protein structural class.

  14. RaptorX-Property: a web server for protein structure property prediction.

    Science.gov (United States)

    Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

    2016-07-08

    RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure.

    Science.gov (United States)

    Baek, Minkyung; Park, Taeyong; Heo, Lim; Park, Chiwook; Seok, Chaok

    2017-07-03

    Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. A systematic review on popularity, application and characteristics of protein secondary structure prediction tools.

    Science.gov (United States)

    Kashani-Amin, Elaheh; Tabatabaei-Malazy, Ozra; Sakhteman, Amirhossein; Larijani, Bagher; Ebrahim-Habibi, Azadeh

    2018-02-27

    Prediction of proteins' secondary structure is one of the major steps in the generation of homology models. These models provide structural information which is used to design suitable ligands for potential medicinal targets. However, selecting a proper tool between multiple secondary structure prediction (SSP) options is challenging. The current study is an insight onto currently favored methods and tools, within various contexts. A systematic review was performed for a comprehensive access to recent (2013-2016) studies which used or recommended protein SSP tools. Three databases, Web of Science, PubMed and Scopus were systematically searched and 99 out of 209 studies were finally found eligible to extract data. Four categories of applications for 59 retrieved SSP tools were: (I) prediction of structural features of a given sequence, (II) evaluation of a method, (III) providing input for a new SSP method and (IV) integrating a SSP tool as a component for a program. PSIPRED was found to be the most popular tool in all four categories. JPred and tools utilizing PHD (Profile network from HeiDelberg) method occupied second and third places of popularity in categories I and II. JPred was only found in the two first categories, while PHD was present in three fields. This study provides a comprehensive insight about the recent usage of SSP tools which could be helpful for selecting a proper tool's choice. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  17. Structure prediction and binding sites analysis of curcin protein of Jatropha curcas using computational approaches.

    Science.gov (United States)

    Srivastava, Mugdha; Gupta, Shishir K; Abhilash, P C; Singh, Nandita

    2012-07-01

    Ribosome inactivating proteins (RIPs) are defense proteins in a number of higher-plant species that are directly targeted toward herbivores. Jatropha curcas is one of the biodiesel plants having RIPs. The Jatropha seed meal, after extraction of oil, is rich in curcin, a highly toxic RIP similar to ricin, which makes it unsuitable for animal feed. Although the toxicity of curcin is well documented in the literature, the detailed toxic properties and the 3D structure of curcin has not been determined by X-ray crystallography, NMR spectroscopy or any in silico techniques to date. In this pursuit, the structure of curcin was modeled by a composite approach of 3D structure prediction using threading and ab initio modeling. Assessment of model quality was assessed by methods which include Ramachandran plot analysis and Qmean score estimation. Further, we applied the protein-ligand docking approach to identify the r-RNA binding residue of curcin. The present work provides the first structural insight into the binding mode of r-RNA adenine to the curcin protein and forms the basis for designing future inhibitors of curcin. Cloning of a future peptide inhibitor within J. curcas can produce non-toxic varieties of J. curcas, which would make the seed-cake suitable as animal feed without curcin detoxification.

  18. Large-scale prediction of drug-target interactions using protein sequences and drug topological structures

    Energy Technology Data Exchange (ETDEWEB)

    Cao Dongsheng [Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha 410083 (China); Liu Shao [Xiangya Hospital, Central South University, Changsha 410008 (China); Xu Qingsong [School of Mathematical Sciences and Computing Technology, Central South University, Changsha 410083 (China); Lu Hongmei; Huang Jianhua [Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha 410083 (China); Hu Qiannan [Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (Wuhan University), Ministry of Education, and Wuhan University School of Pharmaceutical Sciences, Wuhan 430071 (China); Liang Yizeng, E-mail: yizeng_liang@263.net [Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha 410083 (China)

    2012-11-08

    Highlights: Black-Right-Pointing-Pointer Drug-target interactions are predicted using an extended SAR methodology. Black-Right-Pointing-Pointer A drug-target interaction is regarded as an event triggered by many factors. Black-Right-Pointing-Pointer Molecular fingerprint and CTD descriptors are used to represent drugs and proteins. Black-Right-Pointing-Pointer Our approach shows compatibility between the new scheme and current SAR methodology. - Abstract: The identification of interactions between drugs and target proteins plays a key role in the process of genomic drug discovery. It is both consuming and costly to determine drug-target interactions by experiments alone. Therefore, there is an urgent need to develop new in silico prediction approaches capable of identifying these potential drug-target interactions in a timely manner. In this article, we aim at extending current structure-activity relationship (SAR) methodology to fulfill such requirements. In some sense, a drug-target interaction can be regarded as an event or property triggered by many influence factors from drugs and target proteins. Thus, each interaction pair can be represented theoretically by using these factors which are based on the structural and physicochemical properties simultaneously from drugs and proteins. To realize this, drug molecules are encoded with MACCS substructure fingerings representing existence of certain functional groups or fragments; and proteins are encoded with some biochemical and physicochemical properties. Four classes of drug-target interaction networks in humans involving enzymes, ion channels, G-protein-coupled receptors (GPCRs) and nuclear receptors, are independently used for establishing predictive models with support vector machines (SVMs). The SVM models gave prediction accuracy of 90.31%, 88.91%, 84.68% and 83.74% for four datasets, respectively. In conclusion, the results demonstrate the ability of our proposed method to predict the drug

  19. SAAS: Short Amino Acid Sequence - A Promising Protein Secondary Structure Prediction Method of Single Sequence

    Directory of Open Access Journals (Sweden)

    Zhou Yuan Wu

    2013-07-01

    Full Text Available In statistical methods of predicting protein secondary structure, many researchers focus on single amino acid frequencies in α-helices, β-sheets, and so on, or the impact near amino acids on an amino acid forming a secondary structure. But the paper considers a short sequence of amino acids (3, 4, 5 or 6 amino acids as integer, and statistics short sequence's probability forming secondary structure. Also, many researchers select low homologous sequences as statistical database. But this paper select whole PDB database. In this paper we propose a strategy to predict protein secondary structure using simple statistical method. Numerical computation shows that, short amino acids sequence as integer to statistics, which can easy see trend of short sequence forming secondary structure, and it will work well to select large statistical database (whole PDB database without considering homologous, and Q3 accuracy is ca. 74% using this paper proposed simple statistical method, but accuracy of others statistical methods is less than 70%.

  20. Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks.

    Science.gov (United States)

    Babaei, Sepideh; Geranmayeh, Amir; Seyyedsalehi, Seyyed Ali

    2010-12-01

    The supervised learning of recurrent neural networks well-suited for prediction of protein secondary structures from the underlying amino acids sequence is studied. Modular reciprocal recurrent neural networks (MRR-NN) are proposed to model the strong correlations between adjacent secondary structure elements. Besides, a multilayer bidirectional recurrent neural network (MBR-NN) is introduced to capture the long-range intramolecular interactions between amino acids in formation of the secondary structure. The final modular prediction system is devised based on the interactive integration of the MRR-NN and the MBR-NN structures to arbitrarily engage the neighboring effects of the secondary structure types concurrent with memorizing the sequential dependencies of amino acids along the protein chain. The advanced combined network augments the percentage accuracy (Q₃) to 79.36% and boosts the segment overlap (SOV) up to 70.09% when tested on the PSIPRED dataset in three-fold cross-validation. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  1. From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction.

    Science.gov (United States)

    Akhter, Nasrin; Shehu, Amarda

    2018-01-19

    Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.

  2. From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction

    Directory of Open Access Journals (Sweden)

    Nasrin Akhter

    2018-01-01

    Full Text Available Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.

  3. Computational Prediction of Atomic Structures of Helical Membrane Proteins Aided by EM Maps

    Science.gov (United States)

    Kovacs, Julio A.; Yeager, Mark; Abagyan, Ruben

    2007-01-01

    Integral membrane proteins pose a major challenge for protein-structure prediction because only ≈100 high-resolution structures are available currently, thereby impeding the development of rules or empirical potentials to predict the packing of transmembrane α-helices. However, when an intermediate-resolution electron microscopy (EM) map is available, it can be used to provide restraints which, in combination with a suitable computational protocol, make structure prediction feasible. In this work we present such a protocol, which proceeds in three stages: 1), generation of an ensemble of α-helices by flexible fitting into each of the density rods in the low-resolution EM map, spanning a range of rotational angles around the main helical axes and translational shifts along the density rods; 2), fast optimization of side chains and scoring of the resulting conformations; and 3), refinement of the lowest-scoring conformations with internal coordinate mechanics, by optimizing the van der Waals, electrostatics, hydrogen bonding, torsional, and solvation energy contributions. In addition, our method implements a penalty term through a so-called tethering map, derived from the EM map, which restrains the positions of the α-helices. The protocol was validated on three test cases: GpA, KcsA, and MscL. PMID:17496035

  4. Mixing Energy Models in Genetic Algorithms for On-Lattice Protein Structure Prediction

    Directory of Open Access Journals (Sweden)

    Mahmood A. Rashid

    2013-01-01

    Full Text Available Protein structure prediction (PSP is computationally a very challenging problem. The challenge largely comes from the fact that the energy function that needs to be minimised in order to obtain the native structure of a given protein is not clearly known. A high resolution 20×20 energy model could better capture the behaviour of the actual energy function than a low resolution energy model such as hydrophobic polar. However, the fine grained details of the high resolution interaction energy matrix are often not very informative for guiding the search. In contrast, a low resolution energy model could effectively bias the search towards certain promising directions. In this paper, we develop a genetic algorithm that mainly uses a high resolution energy model for protein structure evaluation but uses a low resolution HP energy model in focussing the search towards exploring structures that have hydrophobic cores. We experimentally show that this mixing of energy models leads to significant lower energy structures compared to the state-of-the-art results.

  5. Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction.

    Science.gov (United States)

    Hao, Xiaohu; Zhang, Guijun; Zhou, Xiaogen

    2018-04-01

    Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. LiveBench-1: continuous benchmarking of protein structure prediction servers.

    Science.gov (United States)

    Bujnicki, J M; Elofsson, A; Fischer, D; Rychlewski, L

    2001-02-01

    We present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers. Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM, and INBGU. The assessment was conducted using as prediction targets a large number of selected protein structures released from October 1999 to April 2000. A target was selected if its sequence showed no significant similarity to any of the proteins previously available in the structural database. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: easy and hard. We found that all servers were able to find the correct answer for the vast majority of the easy targets if a structurally similar fold was present in the server's fold libraries. However, among the hard targets--where standard methods such as PSI-BLAST fail--the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which had a significantly accurate sequence-structure alignment. Among the hard targets, the presence of updated libraries appeared to be less critical for the ranking. An "ideally combined consensus" prediction, where the results of all servers are considered, would increase the percentage of correct assignments by 50%. Each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one server in difficult prediction tasks. The LiveBench program (http://BioInfo.PL/LiveBench) is being continued, and all interested developers are cordially invited to join.

  7. Potentials of mean force for protein structure prediction vindicated, formalized and generalized.

    Directory of Open Access Journals (Sweden)

    Thomas Hamelryck

    Full Text Available Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances--so-called "potentials of mean force" (PMFs--have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state--a necessary component of these potentials--is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities "reference ratio distributions" deriving from the application of the "reference ratio method." This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.

  8. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

    Science.gov (United States)

    Fan, Ming; Zheng, Bin; Li, Lihua

    2015-10-01

    Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.

  9. Secondary Structure Prediction of Protein using Resilient Back Propagation Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Jyotshna Dongardive

    2015-12-01

    Full Text Available The paper proposes a neural network based approach to predict secondary structure of protein. It uses Multilayer Feed Forward Network (MLFN with resilient back propagation as the learning algorithm. Point Accepted Mutation (PAM is adopted as the encoding scheme and CB396 data set is used for the training and testing of the network. Overall accuracy of the network has been experimentally calculated with different window sizes for the sliding window scheme and by varying the number of units in the hidden layer. The best results were obtained with eleven as the window size and seven as the number of units in the hidden layer.

  10. Molecular modelling of the Norrie disease protein predicts a cystine knot growth factor tertiary structure.

    Science.gov (United States)

    Meitinger, T; Meindl, A; Bork, P; Rost, B; Sander, C; Haasemann, M; Murken, J

    1993-12-01

    The X-lined gene for Norrie disease, which is characterized by blindness, deafness and mental retardation has been cloned recently. This gene has been thought to code for a putative extracellular factor; its predicted amino acid sequence is homologous to the C-terminal domain of diverse extracellular proteins. Sequence pattern searches and three-dimensional modelling now suggest that the Norrie disease protein (NDP) has a tertiary structure similar to that of transforming growth factor beta (TGF beta). Our model identifies NDP as a member of an emerging family of growth factors containing a cystine knot motif, with direct implications for the physiological role of NDP. The model also sheds light on sequence related domains such as the C-terminal domain of mucins and of von Willebrand factor.

  11. PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

    Directory of Open Access Journals (Sweden)

    Adeel Malik

    2010-01-01

    Full Text Available Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins as well as noncovalently (protein-carbohydrate complexes is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB, (ii Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

  12. Atomic-accuracy prediction of protein loop structures through an RNA-inspired Ansatz.

    Directory of Open Access Journals (Sweden)

    Rhiju Das

    Full Text Available Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth 'RNA-puzzle' competition. These results establish all-atom enumeration as

  13. Computational investigation of kinetics of cross-linking reactions in proteins: importance in structure prediction.

    Science.gov (United States)

    Bandyopadhyay, Pradipta; Kuntz, Irwin D

    2009-01-01

    The determination of protein structure using distance constraints is a new and promising field of study. One implementation involves attaching residues of a protein using a cross-linking agent, followed by protease digestion, analysis of the resulting peptides by mass spectroscopy, and finally sequence threading to detect the protein folds. In the present work, we carry out computational modeling of the kinetics of cross-linking reactions in proteins using the master equation approach. The rate constants of the cross-linking reactions are estimated using the pKas and the solvent-accessible surface areas of the residues involved. This model is tested with fibroblast growth factor (FGF) and cytochrome C. It is consistent with the initial experimental rate data for individual lysine residues for cytochrome C. Our model captures all observed cross-links for FGF and almost 90% of the observed cross-links for cytochrome C, although it also predicts cross-links that were not observed experimentally (false positives). However, the analysis of the false positive results is complicated by the fact that experimental detection of cross-links can be difficult and may depend on specific experimental conditions such as pH, ionic strength. Receiver operator characteristic plots showed that our model does a good job in predicting the observed cross-links. Molecular dynamics simulations showed that for cytochrome C, in general, the two lysines come closer for the observed cross-links as compared to the false positive ones. For FGF, no such clear pattern exists. The kinetic model and MD simulation can be used to study proposed cross-linking protocols.

  14. Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis.

    Science.gov (United States)

    Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong

    2009-07-01

    A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.

  15. Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins

    Directory of Open Access Journals (Sweden)

    Walsh Ian

    2006-09-01

    Full Text Available Abstract Background We describe Distill, a suite of servers for the prediction of protein structural features: secondary structure; relative solvent accessibility; contact density; backbone structural motifs; residue contact maps at 6, 8 and 12 Angstrom; coarse protein topology. The servers are based on large-scale ensembles of recursive neural networks and trained on large, up-to-date, non-redundant subsets of the Protein Data Bank. Together with structural feature predictions, Distill includes a server for prediction of Cα traces for short proteins (up to 200 amino acids. Results The servers are state-of-the-art, with secondary structure predicted correctly for nearly 80% of residues (currently the top performance on EVA, 2-class solvent accessibility nearly 80% correct, and contact maps exceeding 50% precision on the top non-diagonal contacts. A preliminary implementation of the predictor of protein Cα traces featured among the top 20 Novel Fold predictors at the last CASP6 experiment as group Distill (ID 0348. The majority of the servers, including the Cα trace predictor, now take into account homology information from the PDB, when available, resulting in greatly improved reliability. Conclusion All predictions are freely available through a simple joint web interface and the results are returned by email. In a single submission the user can send protein sequences for a total of up to 32k residues to all or a selection of the servers. Distill is accessible at the address: http://distill.ucd.ie/distill/.

  16. Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

    Science.gov (United States)

    Li, Yaohang; Rata, Ionel; Chiu, See-wing; Jakobsson, Eric

    2010-07-20

    Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

  17. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    Directory of Open Access Journals (Sweden)

    Yunyun Liang

    2015-01-01

    Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM. Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS, segmented PsePSSM, and segmented autocovariance transformation (ACT based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640 are adopted in this paper. Then a 700-dimensional (700D feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA. To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  18. Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

    Directory of Open Access Journals (Sweden)

    Lees Jonathan G

    2008-01-01

    Full Text Available Abstract Background A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. Results In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Conclusion Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained.

  19. UNRES server for physics-based coarse-grained simulations and prediction of protein structure, dynamics and thermodynamics.

    Science.gov (United States)

    Czaplewski, Cezary; Karczynska, Agnieszka; Sieradzan, Adam K; Liwo, Adam

    2018-04-30

    A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.

  20. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis.

    Science.gov (United States)

    Masso, Majid; Vaisman, Iosif I

    2008-09-15

    Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. A web server with supporting documentation is available at http://proteins.gmu.edu/automute.

  1. An ensemble method for predicting subnuclear localizations from primary protein structures.

    Directory of Open Access Journals (Sweden)

    Guo Sheng Han

    Full Text Available BACKGROUND: Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. METHODOLOGY/PRINCIPAL FINDINGS: A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. CONCLUSIONS: It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method

  2. Holistic Approach to Partial Covalent Interactions in Protein Structure Prediction and Design with Rosetta.

    Science.gov (United States)

    Combs, Steven A; Mueller, Benjamin K; Meiler, Jens

    2018-05-29

    Partial covalent interactions (PCIs) in proteins, which include hydrogen bonds, salt bridges, cation-π, and π-π interactions, contribute to thermodynamic stability and facilitate interactions with other biomolecules. Several score functions have been developed within the Rosetta protein modeling framework that identify and evaluate these PCIs through analyzing the geometry between participating atoms. However, we hypothesize that PCIs can be unified through a simplified electron orbital representation. To test this hypothesis, we have introduced orbital based chemical descriptors for PCIs into Rosetta, called the PCI score function. Optimal geometries for the PCIs are derived from a statistical analysis of high-quality protein structures obtained from the Protein Data Bank (PDB), and the relative orientation of electron deficient hydrogen atoms and electron-rich lone pair or π orbitals are evaluated. We demonstrate that nativelike geometries of hydrogen bonds, salt bridges, cation-π, and π-π interactions are recapitulated during minimization of protein conformation. The packing density of tested protein structures increased from the standard score function from 0.62 to 0.64, closer to the native value of 0.70. Overall, rotamer recovery improved when using the PCI score function (75%) as compared to the standard Rosetta score function (74%). The PCI score function represents an improvement over the standard Rosetta score function for protein model scoring; in addition, it provides a platform for future directions in the analysis of small molecule to protein interactions, which depend on partial covalent interactions.

  3. Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.

    Directory of Open Access Journals (Sweden)

    Niklas Berliner

    Full Text Available Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases.

  4. CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures.

    Directory of Open Access Journals (Sweden)

    Oliver C Redfern

    2007-11-01

    Full Text Available We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure-based method (using graph theory to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these

  5. BCL::MP-Fold: membrane protein structure prediction guided by EPR restraints

    Science.gov (United States)

    Fischer, Axel W.; Alexander, Nathan S.; Woetzel, Nils; Karakaş, Mert; Weiner, Brian E.; Meiler, Jens

    2016-01-01

    For many membrane proteins, the determination of their topology remains a challenge for methods like X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. Electron paramagnetic resonance (EPR) spectroscopy has evolved as an alternative technique to study structure and dynamics of membrane proteins. The present study demonstrates the feasibility of membrane protein topology determination using limited EPR distance and accessibility measurements. The BCL::MP-Fold algorithm assembles secondary structure elements (SSEs) in the membrane using a Monte Carlo Metropolis (MCM) approach. Sampled models are evaluated using knowledge-based potential functions and agreement with the EPR data and a knowledge-based energy function. Twenty-nine membrane proteins of up to 696 residues are used to test the algorithm. The protein-size-normalized root-mean-square-deviation (RMSD100) value of the most accurate model is better than 8 Å for twenty-seven, better than 6 Å for twenty-two, and better than 4 Å for fifteen out of twenty-nine proteins, demonstrating the algorithm’s ability to sample the native topology. The average enrichment could be improved from 1.3 to 2.5, showing the improved discrimination power by using EPR data. PMID:25820805

  6. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures

    DEFF Research Database (Denmark)

    Andersen, P.H.; Nielsen, Morten; Lund, Ole

    2006-01-01

    . We show that the new structure-based method has a better performance for predicting residues of discontinuous epitopes than methods based solely on sequence information, and that it can successfully predict epitope residues that have been identified by different techniques. DiscoTope detects 15...... experimental epitope mapping in both rational vaccine design and development of diagnostic tools, and may lead to more efficient epitope identification....

  7. Multi-template homology based structure prediction and molecular docking studies of protein ‘L’ of Zaire ebolavirus (EBOV

    Directory of Open Access Journals (Sweden)

    Jayasree Ganugapati

    2017-01-01

    Full Text Available Ebola is one of the most dangerous pathogenic RNA virus that causes severe hemorrhagic fever in humans and is considered to be a threat to humanity. The RNA genome of EBOV encodes seven proteins viz., glycoprotein (GP, nucleoprotein (NP, RNA-dependent RNA polymerase (protein ‘L’, VP35, VP30, VP40 andVP24. The objective of the present study is to find a suitable inhibitor for protein ‘L’. The large structural protein ‘L’, is made up of 2212 amino acid residues. This protein works as an RNA-dependent RNA polymerase (RdRp and a methyl transferase. It is carried by the virus during the infection as the host mechanisms cannot be used to transcribe the –ss RNA genome of the virus. As the protein is crucial for the replication of the viral genome and no other host enzyme can perform the same function, this viral protein ‘L’ was considered as a potential drug target to design inhibitors. The 3D structure of protein ‘L’ is not available to date. This is a limitation in understanding the protein's function. Hence, the present work is aimed at predicting the first homology-based model of protein ‘L’ and elucidating the function by providing insight into the molecular details of the protein. As there is no drug available for the treatment of EBOV infection our findings play a crucial a role to identify an inhibitor of the protein ‘L’ of EBOV. HTS against ZINC database resulted in identification of few possible inhibitors. Molecular docking studies resulted in finding a suitable inhibitor for protein ‘L’.

  8. Predicting protein complexes using a supervised learning method combined with local structural information.

    Science.gov (United States)

    Dong, Yadong; Sun, Yongqi; Qin, Chao

    2018-01-01

    The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.

  9. The Expanded FindCore Method for Identification of a Core Atom Set for Assessment of Protein Structure Prediction

    Science.gov (United States)

    Snyder, David A.; Grullon, Jennifer; Huang, Yuanpeng J.; Tejero, Roberto; Montelione, Gaetano T.

    2014-01-01

    Maximizing the scientific impact of NMR-based structure determination requires robust and statistically sound methods for assessing the precision of NMR-derived structures. In particular, a method to define a core atom set for calculating superimpositions and validating structure predictions is critical to the use of NMR-derived structures as targets in the CASP competition. FindCore (D.A. Snyder and G.T. Montelione PROTEINS 2005;59:673–686) is a superimposition independent method for identifying a core atom set, and partitioning that set into domains. However, as FindCore optimizes superimposition by sensitively excluding not-well-defined atoms, the FindCore core may not comprise all atoms suitable for use in certain applications of NMR structures, including the CASP assessment process. Adapting the FindCore approach to assess predicted models against experimental NMR structures in CASP10 required modification of the FindCore method. This paper describes conventions and a standard protocol to calculate an “Expanded FindCore” atom set suitable for validation and application in biological and biophysical contexts. A key application of the Expanded FindCore method is to identify a core set of atoms in the experimental NMR structure for which it makes sense to validate predicted protein structure models. We demonstrate the application of this Expanded FindCore method in characterizing well-defined regions of 18 NMR-derived CASP10 target structures. The Expanded FindCore protocol defines “expanded core atom sets” that match an expert’s intuition of which parts of the structure are sufficiently well-defined to use in assessing CASP model predictions. We also illustrate the impact of this analysis on the CASP GDT assessment scores. PMID:24327305

  10. Improving predictions of protein-protein interfaces by combining amino acid-specific classifiers based on structural and physicochemical descriptors with their weighted neighbor averages.

    Directory of Open Access Journals (Sweden)

    Fábio R de Moraes

    Full Text Available Protein-protein interactions are involved in nearly all regulatory processes in the cell and are considered one of the most important issues in molecular biology and pharmaceutical sciences but are still not fully understood. Structural and computational biology contributed greatly to the elucidation of the mechanism of protein interactions. In this paper, we present a collection of the physicochemical and structural characteristics that distinguish interface-forming residues (IFR from free surface residues (FSR. We formulated a linear discriminative analysis (LDA classifier to assess whether chosen descriptors from the BlueStar STING database (http://www.cbi.cnptia.embrapa.br/SMS/ are suitable for such a task. Receiver operating characteristic (ROC analysis indicates that the particular physicochemical and structural descriptors used for building the linear classifier perform much better than a random classifier and in fact, successfully outperform some of the previously published procedures, whose performance indicators were recently compared by other research groups. The results presented here show that the selected set of descriptors can be utilized to predict IFRs, even when homologue proteins are missing (particularly important for orphan proteins where no homologue is available for comparative analysis/indication or, when certain conformational changes accompany interface formation. The development of amino acid type specific classifiers is shown to increase IFR classification performance. Also, we found that the addition of an amino acid conservation attribute did not improve the classification prediction. This result indicates that the increase in predictive power associated with amino acid conservation is exhausted by adequate use of an extensive list of independent physicochemical and structural parameters that, by themselves, fully describe the nano-environment at protein-protein interfaces. The IFR classifier developed in this study

  11. Improving predictions of protein-protein interfaces by combining amino acid-specific classifiers based on structural and physicochemical descriptors with their weighted neighbor averages.

    Science.gov (United States)

    de Moraes, Fábio R; Neshich, Izabella A P; Mazoni, Ivan; Yano, Inácio H; Pereira, José G C; Salim, José A; Jardine, José G; Neshich, Goran

    2014-01-01

    Protein-protein interactions are involved in nearly all regulatory processes in the cell and are considered one of the most important issues in molecular biology and pharmaceutical sciences but are still not fully understood. Structural and computational biology contributed greatly to the elucidation of the mechanism of protein interactions. In this paper, we present a collection of the physicochemical and structural characteristics that distinguish interface-forming residues (IFR) from free surface residues (FSR). We formulated a linear discriminative analysis (LDA) classifier to assess whether chosen descriptors from the BlueStar STING database (http://www.cbi.cnptia.embrapa.br/SMS/) are suitable for such a task. Receiver operating characteristic (ROC) analysis indicates that the particular physicochemical and structural descriptors used for building the linear classifier perform much better than a random classifier and in fact, successfully outperform some of the previously published procedures, whose performance indicators were recently compared by other research groups. The results presented here show that the selected set of descriptors can be utilized to predict IFRs, even when homologue proteins are missing (particularly important for orphan proteins where no homologue is available for comparative analysis/indication) or, when certain conformational changes accompany interface formation. The development of amino acid type specific classifiers is shown to increase IFR classification performance. Also, we found that the addition of an amino acid conservation attribute did not improve the classification prediction. This result indicates that the increase in predictive power associated with amino acid conservation is exhausted by adequate use of an extensive list of independent physicochemical and structural parameters that, by themselves, fully describe the nano-environment at protein-protein interfaces. The IFR classifier developed in this study is now

  12. Improving Predictions of Protein-Protein Interfaces by Combining Amino Acid-Specific Classifiers Based on Structural and Physicochemical Descriptors with Their Weighted Neighbor Averages

    Science.gov (United States)

    de Moraes, Fábio R.; Neshich, Izabella A. P.; Mazoni, Ivan; Yano, Inácio H.; Pereira, José G. C.; Salim, José A.; Jardine, José G.; Neshich, Goran

    2014-01-01

    Protein-protein interactions are involved in nearly all regulatory processes in the cell and are considered one of the most important issues in molecular biology and pharmaceutical sciences but are still not fully understood. Structural and computational biology contributed greatly to the elucidation of the mechanism of protein interactions. In this paper, we present a collection of the physicochemical and structural characteristics that distinguish interface-forming residues (IFR) from free surface residues (FSR). We formulated a linear discriminative analysis (LDA) classifier to assess whether chosen descriptors from the BlueStar STING database (http://www.cbi.cnptia.embrapa.br/SMS/) are suitable for such a task. Receiver operating characteristic (ROC) analysis indicates that the particular physicochemical and structural descriptors used for building the linear classifier perform much better than a random classifier and in fact, successfully outperform some of the previously published procedures, whose performance indicators were recently compared by other research groups. The results presented here show that the selected set of descriptors can be utilized to predict IFRs, even when homologue proteins are missing (particularly important for orphan proteins where no homologue is available for comparative analysis/indication) or, when certain conformational changes accompany interface formation. The development of amino acid type specific classifiers is shown to increase IFR classification performance. Also, we found that the addition of an amino acid conservation attribute did not improve the classification prediction. This result indicates that the increase in predictive power associated with amino acid conservation is exhausted by adequate use of an extensive list of independent physicochemical and structural parameters that, by themselves, fully describe the nano-environment at protein-protein interfaces. The IFR classifier developed in this study is now

  13. Prediction by graph theoretic measures of structural effects in proteins arising from non-synonymous single nucleotide polymorphisms.

    Directory of Open Access Journals (Sweden)

    Tammy M K Cheng

    Full Text Available Recent analyses of human genome sequences have given rise to impressive advances in identifying non-synonymous single nucleotide polymorphisms (nsSNPs. By contrast, the annotation of nsSNPs and their links to diseases are progressing at a much slower pace. Many of the current approaches to analysing disease-associated nsSNPs use primarily sequence and evolutionary information, while structural information is relatively less exploited. In order to explore the potential of such information, we developed a structure-based approach, Bongo (Bonds ON Graph, to predict structural effects of nsSNPs. Bongo considers protein structures as residue-residue interaction networks and applies graph theoretical measures to identify the residues that are critical for maintaining structural stability by assessing the consequences on the interaction network of single point mutations. Our results show that Bongo is able to identify mutations that cause both local and global structural effects, with a remarkably low false positive rate. Application of the Bongo method to the prediction of 506 disease-associated nsSNPs resulted in a performance (positive predictive value, PPV, 78.5% similar to that of PolyPhen (PPV, 77.2% and PANTHER (PPV, 72.2%. As the Bongo method is solely structure-based, our results indicate that the structural changes resulting from nsSNPs are closely associated to their pathological consequences.

  14. Protein Sorting Prediction

    DEFF Research Database (Denmark)

    Nielsen, Henrik

    2017-01-01

    and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.......Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths...

  15. Mutagenesis Objective Search and Selection Tool (MOSST: an algorithm to predict structure-function related mutations in proteins

    Directory of Open Access Journals (Sweden)

    Asenjo Juan A

    2011-04-01

    Full Text Available Abstract Background Functionally relevant artificial or natural mutations are difficult to assess or predict if no structure-function information is available for a protein. This is especially important to correctly identify functionally significant non-synonymous single nucleotide polymorphisms (nsSNPs or to design a site-directed mutagenesis strategy for a target protein. A new and powerful methodology is proposed to guide these two decision strategies, based only on conservation rules of physicochemical properties of amino acids extracted from a multiple alignment of a protein family where the target protein belongs, with no need of explicit structure-function relationships. Results A statistical analysis is performed over each amino acid position in the multiple protein alignment, based on different amino acid physical or chemical characteristics, including hydrophobicity, side-chain volume, charge and protein conformational parameters. The variances of each of these properties at each position are combined to obtain a global statistical indicator of the conservation degree of each property. Different types of physicochemical conservation are defined to characterize relevant and irrelevant positions. The differences between statistical variances are taken together as the basis of hypothesis tests at each position to search for functionally significant mutable sites and to identify specific mutagenesis targets. The outcome is used to statistically predict physicochemical consensus sequences based on different properties and to calculate the amino acid propensities at each position in a given protein. Hence, amino acid positions are identified that are putatively responsible for function, specificity, stability or binding interactions in a family of proteins. Once these key functional positions are identified, position-specific statistical distributions are applied to divide the 20 common protein amino acids in each position of the protein

  16. Structural insights into Cydia pomonella pheromone binding protein 2 mediated prediction of potentially active semiochemicals

    Science.gov (United States)

    Tian, Zhen; Liu, Jiyuan; Zhang, Yalin

    2016-03-01

    Given the advantages of behavioral disruption application in pest control and the damage of Cydia pomonella, due progresses have not been made in searching active semiochemicals for codling moth. In this research, 31 candidate semiochemicals were ranked for their binding potential to Cydia pomonella pheromone binding protein 2 (CpomPBP2) by simulated docking, and this sorted result was confirmed by competitive binding assay. This high predicting accuracy of virtual screening led to the construction of a rapid and viable method for semiochemicals searching. By reference to binding mode analyses, hydrogen bond and hydrophobic interaction were suggested to be two key factors in determining ligand affinity, so is the length of molecule chain. So it is concluded that semiochemicals of appropriate chain length with hydroxyl group or carbonyl group at one head tended to be favored by CpomPBP2. Residues involved in binding with each ligand were pointed out as well, which were verified by computational alanine scanning mutagenesis. Progress made in the present study helps establish an efficient method for predicting potentially active compounds and prepares for the application of high-throughput virtual screening in searching semiochemicals by taking insights into binding mode analyses.

  17. Prediction of protein loop geometries in solution

    NARCIS (Netherlands)

    Rapp, Chaya S.; Strauss, Temima; Nederveen, Aart; Fuentes, Gloria

    2007-01-01

    The ability to determine the structure of a protein in solution is a critical tool for structural biology, as proteins in their native state are found in aqueous environments. Using a physical chemistry based prediction protocol, we demonstrate the ability to reproduce protein loop geometries in

  18. Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10.

    Science.gov (United States)

    Zhang, Yang

    2014-02-01

    We develop and test a new pipeline in CASP10 to predict protein structures based on an interplay of I-TASSER and QUARK for both free-modeling (FM) and template-based modeling (TBM) targets. The most noteworthy observation is that sorting through the threading template pool using the QUARK-based ab initio models as probes allows the detection of distant-homology templates which might be ignored by the traditional sequence profile-based threading alignment algorithms. Further template assembly refinement by I-TASSER resulted in successful folding of two medium-sized FM targets with >150 residues. For TBM, the multiple threading alignments from LOMETS are, for the first time, incorporated into the ab initio QUARK simulations, which were further refined by I-TASSER assembly refinement. Compared with the traditional threading assembly refinement procedures, the inclusion of the threading-constrained ab initio folding models can consistently improve the quality of the full-length models as assessed by the GDT-HA and hydrogen-bonding scores. Despite the success, significant challenges still exist in domain boundary prediction and consistent folding of medium-size proteins (especially beta-proteins) for nonhomologous targets. Further developments of sensitive fold-recognition and ab initio folding methods are critical for solving these problems. Copyright © 2013 Wiley Periodicals, Inc.

  19. Prediction of Protein-Protein Interactions Related to Protein Complexes Based on Protein Interaction Networks

    Directory of Open Access Journals (Sweden)

    Peng Liu

    2015-01-01

    Full Text Available A method for predicting protein-protein interactions based on detected protein complexes is proposed to repair deficient interactions derived from high-throughput biological experiments. Protein complexes are pruned and decomposed into small parts based on the adaptive k-cores method to predict protein-protein interactions associated with the complexes. The proposed method is adaptive to protein complexes with different structure, number, and size of nodes in a protein-protein interaction network. Based on different complex sets detected by various algorithms, we can obtain different prediction sets of protein-protein interactions. The reliability of the predicted interaction sets is proved by using estimations with statistical tests and direct confirmation of the biological data. In comparison with the approaches which predict the interactions based on the cliques, the overlap of the predictions is small. Similarly, the overlaps among the predicted sets of interactions derived from various complex sets are also small. Thus, every predicted set of interactions may complement and improve the quality of the original network data. Meanwhile, the predictions from the proposed method replenish protein-protein interactions associated with protein complexes using only the network topology.

  20. LoopIng: a template-based tool for predicting the structure of protein loops.

    KAUST Repository

    Messih, Mario Abdel; Lepore, Rosalba; Tramontano, Anna

    2015-01-01

    ) and significant enhancements for long loops (11-20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has

  1. Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction.

    Science.gov (United States)

    Rysavy, Steven J; Beck, David A C; Daggett, Valerie

    2014-11-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.

  2. LECTINPred: web Server that Uses Complex Networks of Protein Structure for Prediction of Lectins with Potential Use as Cancer Biomarkers or in Parasite Vaccine Design.

    Science.gov (United States)

    Munteanu, Cristian R; Pedreira, Nieves; Dorado, Julián; Pazos, Alejandro; Pérez-Montoto, Lázaro G; Ubeira, Florencio M; González-Díaz, Humberto

    2014-04-01

    Lectins (Ls) play an important role in many diseases such as different types of cancer, parasitic infections and other diseases. Interestingly, the Protein Data Bank (PDB) contains +3000 protein 3D structures with unknown function. Thus, we can in principle, discover new Ls mining non-annotated structures from PDB or other sources. However, there are no general models to predict new biologically relevant Ls based on 3D chemical structures. We used the MARCH-INSIDE software to calculate the Markov-Shannon 3D electrostatic entropy parameters for the complex networks of protein structure of 2200 different protein 3D structures, including 1200 Ls. We have performed a Linear Discriminant Analysis (LDA) using these parameters as inputs in order to seek a new Quantitative Structure-Activity Relationship (QSAR) model, which is able to discriminate 3D structure of Ls from other proteins. We implemented this predictor in the web server named LECTINPred, freely available at http://bio-aims.udc.es/LECTINPred.php. This web server showed the following goodness-of-fit statistics: Sensitivity=96.7 % (for Ls), Specificity=87.6 % (non-active proteins), and Accuracy=92.5 % (for all proteins), considering altogether both the training and external prediction series. In mode 2, users can carry out an automatic retrieval of protein structures from PDB. We illustrated the use of this server, in operation mode 1, performing a data mining of PDB. We predicted Ls scores for +2000 proteins with unknown function and selected the top-scored ones as possible lectins. In operation mode 2, LECTINPred can also upload 3D structural models generated with structure-prediction tools like LOMETS or PHYRE2. The new Ls are expected to be of relevance as cancer biomarkers or useful in parasite vaccine design. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Potentials of mean force for protein structure prediction vindicated, formalized and generalized

    DEFF Research Database (Denmark)

    Hamelryck, Thomas; Borg, Mikael; Paluszewski, Martin

    2010-01-01

    and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state--a necessary component of these potentials--is an unsolved problem. PMFs are loosely justified by analogy to the reversible...

  4. Dynameomics: Data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction

    Science.gov (United States)

    Rysavy, Steven J; Beck, David AC; Daggett, Valerie

    2014-01-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412

  5. Post processing of protein-compound docking for fragment-based drug discovery (FBDD): in-silico structure-based drug screening and ligand-binding pose prediction.

    Science.gov (United States)

    Fukunishi, Yoshifumi

    2010-01-01

    For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.

  6. Combining NMR ensembles and molecular dynamics simulations provides more realistic models of protein structures in solution and leads to better chemical shift prediction

    International Nuclear Information System (INIS)

    Lehtivarjo, Juuso; Tuppurainen, Kari; Hassinen, Tommi; Laatikainen, Reino; Peräkylä, Mikael

    2012-01-01

    While chemical shifts are invaluable for obtaining structural information from proteins, they also offer one of the rare ways to obtain information about protein dynamics. A necessary tool in transforming chemical shifts into structural and dynamic information is chemical shift prediction. In our previous work we developed a method for 4D prediction of protein 1 H chemical shifts in which molecular motions, the 4th dimension, were modeled using molecular dynamics (MD) simulations. Although the approach clearly improved the prediction, the X-ray structures and single NMR conformers used in the model cannot be considered fully realistic models of protein in solution. In this work, NMR ensembles (NMRE) were used to expand the conformational space of proteins (e.g. side chains, flexible loops, termini), followed by MD simulations for each conformer to map the local fluctuations. Compared with the non-dynamic model, the NMRE+MD model gave 6–17% lower root-mean-square (RMS) errors for different backbone nuclei. The improved prediction indicates that NMR ensembles with MD simulations can be used to obtain a more realistic picture of protein structures in solutions and moreover underlines the importance of short and long time-scale dynamics for the prediction. The RMS errors of the NMRE+MD model were 0.24, 0.43, 0.98, 1.03, 1.16 and 2.39 ppm for 1 Hα, 1 HN, 13 Cα, 13 Cβ, 13 CO and backbone 15 N chemical shifts, respectively. The model is implemented in the prediction program 4DSPOT, available at http://www.uef.fi/4dspothttp://www.uef.fi/4dspot.

  7. Combining NMR ensembles and molecular dynamics simulations provides more realistic models of protein structures in solution and leads to better chemical shift prediction

    Energy Technology Data Exchange (ETDEWEB)

    Lehtivarjo, Juuso, E-mail: juuso.lehtivarjo@uef.fi; Tuppurainen, Kari; Hassinen, Tommi; Laatikainen, Reino [University of Eastern Finland, School of Pharmacy (Finland); Peraekylae, Mikael [University of Eastern Finland, Institute of Biomedicine (Finland)

    2012-03-15

    While chemical shifts are invaluable for obtaining structural information from proteins, they also offer one of the rare ways to obtain information about protein dynamics. A necessary tool in transforming chemical shifts into structural and dynamic information is chemical shift prediction. In our previous work we developed a method for 4D prediction of protein {sup 1}H chemical shifts in which molecular motions, the 4th dimension, were modeled using molecular dynamics (MD) simulations. Although the approach clearly improved the prediction, the X-ray structures and single NMR conformers used in the model cannot be considered fully realistic models of protein in solution. In this work, NMR ensembles (NMRE) were used to expand the conformational space of proteins (e.g. side chains, flexible loops, termini), followed by MD simulations for each conformer to map the local fluctuations. Compared with the non-dynamic model, the NMRE+MD model gave 6-17% lower root-mean-square (RMS) errors for different backbone nuclei. The improved prediction indicates that NMR ensembles with MD simulations can be used to obtain a more realistic picture of protein structures in solutions and moreover underlines the importance of short and long time-scale dynamics for the prediction. The RMS errors of the NMRE+MD model were 0.24, 0.43, 0.98, 1.03, 1.16 and 2.39 ppm for {sup 1}H{alpha}, {sup 1}HN, {sup 13}C{alpha}, {sup 13}C{beta}, {sup 13}CO and backbone {sup 15}N chemical shifts, respectively. The model is implemented in the prediction program 4DSPOT, available at http://www.uef.fi/4dspothttp://www.uef.fi/4dspot.

  8. Structural prediction in aphasia

    Directory of Open Access Journals (Sweden)

    Tessa Warren

    2015-05-01

    Full Text Available There is considerable evidence that young healthy comprehenders predict the structure of upcoming material, and that their processing is facilitated when they encounter material matching those predictions (e.g., Staub & Clifton, 2006; Yoshida, Dickey & Sturt, 2013. However, less is known about structural prediction in aphasia. There is evidence that lexical prediction may be spared in aphasia (Dickey et al., 2014; Love & Webb, 1977; cf. Mack et al, 2013. However, predictive mechanisms supporting facilitated lexical access may not necessarily support structural facilitation. Given that many people with aphasia (PWA exhibit syntactic deficits (e.g. Goodglass, 1993, PWA with such impairments may not engage in structural prediction. However, recent evidence suggests that some PWA may indeed predict upcoming structure (Hanne, Burchert, De Bleser, & Vashishth, 2015. Hanne et al. tracked the eyes of PWA (n=8 with sentence-comprehension deficits while they listened to reversible subject-verb-object (SVO and object-verb-subject (OVS sentences in German, in a sentence-picture matching task. Hanne et al. manipulated case and number marking to disambiguate the sentences’ structure. Gazes to an OVS or SVO picture during the unfolding of a sentence were assumed to indicate prediction of the structure congruent with that picture. According to this measure, the PWA’s structural prediction was impaired compared to controls, but they did successfully predict upcoming structure when morphosyntactic cues were strong and unambiguous. Hanne et al.’s visual-world evidence is suggestive, but their forced-choice sentence-picture matching task places tight constraints on possible structural predictions. Clearer evidence of structural prediction would come from paradigms where the content of upcoming material is not as constrained. The current study used self-paced reading study to examine structural prediction among PWA in less constrained contexts. PWA (n=17 who

  9. Nucleotide sequence of Phaseolus vulgaris L. alcohol dehydrogenase encoding cDNA and three-dimensional structure prediction of the deduced protein.

    Science.gov (United States)

    Amelia, Kassim; Khor, Chin Yin; Shah, Farida Habib; Bhore, Subhash J

    2015-01-01

    Common beans (Phaseolus vulgaris L.) are widely consumed as a source of proteins and natural products. However, its yield needs to be increased. In line with the agenda of Phaseomics (an international consortium), work of expressed sequence tags (ESTs) generation from bean pods was initiated. Altogether, 5972 ESTs have been isolated. Alcohol dehydrogenase (AD) encoding gene cDNA was a noticeable transcript among the generated ESTs. This AD is an important enzyme; therefore, to understand more about it this study was undertaken. The objective of this study was to elucidate P. vulgaris L. AD (PvAD) gene cDNA sequence and to predict the three-dimensional (3D) structure of deduced protein. positive and negative strands of the PvAD cDNA clone were sequenced using M13 forward and M13 reverse primers to elucidate the nucleotide sequence. Deduced PvAD cDNA and protein sequence was analyzed for their basic features using online bioinformatics tools. Sequence comparison was carried out using bl2seq program, and tree-view program was used to construct a phylogenetic tree. The secondary structures and 3D structure of PvAD protein were predicted by using the PHYRE automatic fold recognition server. The sequencing results analysis showed that PvAD cDNA is 1294 bp in length. It's open reading frame encodes for a protein that contains 371 amino acids. Deduced protein sequence analysis showed the presence of putative substrate binding, catalytic Zn binding, and NAD binding sites. Results indicate that the predicted 3D structure of PvAD protein is analogous to the experimentally determined crystal structure of s-nitrosoglutathione reductase from an Arabidopsis species. The 1294 bp long PvAD cDNA encodes for 371 amino acid long protein that contains conserved domains required for biological functions of AD. The predicted deduced PvAD protein's 3D structure reflects the analogy with the crystal structure of Arabidopsis thaliana s-nitrosoglutathione reductase. Further study is required

  10. Linguistic Structure Prediction

    CERN Document Server

    Smith, Noah A

    2011-01-01

    A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. W

  11. Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy

    KAUST Repository

    Ogorzalek, Tadeusz L.

    2018-01-04

    Experimental data offers empowering constraints for structure prediction. These constraints can be used to filter equivalently scored models or more powerfully within optimization functions toward prediction. In CASP12, Small Angle X-ray Scattering (SAXS) and Cross-Linking Mass Spectrometry (CLMS) data, measured on an exemplary set of novel fold targets, were provided to the CASP community of protein structure predictors. As HT, solution-based techniques, SAXS and CLMS can efficiently measure states of the full-length sequence in its native solution conformation and assembly. However, this experimental data did not substantially improve prediction accuracy judged by fits to crystallographic models. One issue, beyond intrinsic limitations of the algorithms, was a disconnect between crystal structures and solution-based measurements. Our analyses show that many targets had substantial percentages of disordered regions (up to 40%) or were multimeric or both. Thus, solution measurements of flexibility and assembly support variations that may confound prediction algorithms trained on crystallographic data and expecting globular fully-folded monomeric proteins. Here, we consider the CLMS and SAXS data collected, the information in these solution measurements, and the challenges in incorporating them into computational prediction. As improvement opportunities were only partly realized in CASP12, we provide guidance on how data from the full-length biological unit and the solution state can better aid prediction of the folded monomer or subunit. We furthermore describe strategic integrations of solution measurements with computational prediction programs with the aim of substantially improving foundational knowledge and the accuracy of computational algorithms for biologically-relevant structure predictions for proteins in solution. This article is protected by copyright. All rights reserved.

  12. Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy

    KAUST Repository

    Ogorzalek, Tadeusz L.; Hura, Greg L.; Belsom, Adam; Burnett, Kathryn H.; Kryshtafovych, Andriy; Tainer, John A.; Rappsilber, Juri; Tsutakawa, Susan E.; Fidelis, Krzysztof

    2018-01-01

    Experimental data offers empowering constraints for structure prediction. These constraints can be used to filter equivalently scored models or more powerfully within optimization functions toward prediction. In CASP12, Small Angle X-ray Scattering (SAXS) and Cross-Linking Mass Spectrometry (CLMS) data, measured on an exemplary set of novel fold targets, were provided to the CASP community of protein structure predictors. As HT, solution-based techniques, SAXS and CLMS can efficiently measure states of the full-length sequence in its native solution conformation and assembly. However, this experimental data did not substantially improve prediction accuracy judged by fits to crystallographic models. One issue, beyond intrinsic limitations of the algorithms, was a disconnect between crystal structures and solution-based measurements. Our analyses show that many targets had substantial percentages of disordered regions (up to 40%) or were multimeric or both. Thus, solution measurements of flexibility and assembly support variations that may confound prediction algorithms trained on crystallographic data and expecting globular fully-folded monomeric proteins. Here, we consider the CLMS and SAXS data collected, the information in these solution measurements, and the challenges in incorporating them into computational prediction. As improvement opportunities were only partly realized in CASP12, we provide guidance on how data from the full-length biological unit and the solution state can better aid prediction of the folded monomer or subunit. We furthermore describe strategic integrations of solution measurements with computational prediction programs with the aim of substantially improving foundational knowledge and the accuracy of computational algorithms for biologically-relevant structure predictions for proteins in solution. This article is protected by copyright. All rights reserved.

  13. Protein complex prediction in large ontology attributed protein-protein interaction networks.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Li, Yanpeng; Xu, Bo

    2013-01-01

    Protein complexes are important for unraveling the secrets of cellular organization and function. Many computational approaches have been developed to predict protein complexes in protein-protein interaction (PPI) networks. However, most existing approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology (GO) annotation information. In this paper, we constructed ontology attributed PPI networks with PPI data and GO resource. After constructing ontology attributed networks, we proposed a novel approach called CSO (clustering based on network structure and ontology attribute similarity). Structural information and GO attribute information are complementary in ontology attributed networks. CSO can effectively take advantage of the correlation between frequent GO annotation sets and the dense subgraph for protein complex prediction. Our proposed CSO approach was applied to four different yeast PPI data sets and predicted many well-known protein complexes. The experimental results showed that CSO was valuable in predicting protein complexes and achieved state-of-the-art performance.

  14. Prediction of Protein Configurational Entropy (Popcoen).

    Science.gov (United States)

    Goethe, Martin; Gleixner, Jan; Fita, Ignacio; Rubi, J Miguel

    2018-03-13

    A knowledge-based method for configurational entropy prediction of proteins is presented; this methodology is extremely fast, compared to previous approaches, because it does not involve any type of configurational sampling. Instead, the configurational entropy of a query fold is estimated by evaluating an artificial neural network, which was trained on molecular-dynamics simulations of ∼1000 proteins. The predicted entropy can be incorporated into a large class of protein software based on cost-function minimization/evaluation, in which configurational entropy is currently neglected for performance reasons. Software of this type is used for all major protein tasks such as structure predictions, proteins design, NMR and X-ray refinement, docking, and mutation effect predictions. Integrating the predicted entropy can yield a significant accuracy increase as we show exemplarily for native-state identification with the prominent protein software FoldX. The method has been termed Popcoen for Prediction of Protein Configurational Entropy. An implementation is freely available at http://fmc.ub.edu/popcoen/ .

  15. Prediction of protein structure with the coarse-grained UNRES force field assisted by small X-ray scattering data and knowledge-based information.

    Science.gov (United States)

    Karczyńska, Agnieszka S; Mozolewska, Magdalena A; Krupa, Paweł; Giełdoń, Artur; Liwo, Adam; Czaplewski, Cezary

    2018-03-01

    A new approach to assisted protein-structure prediction has been proposed, which is based on running multiplexed replica exchange molecular dynamics simulations with the coarse-grained UNRES force field with restraints derived from knowledge-based models and distance distribution from small angle X-ray scattering (SAXS) measurements. The latter restraints are incorporated into the target function as a maximum-likelihood term that guides the shape of the simulated structures towards that defined by SAXS. The approach was first verified with the 1KOY protein, for which the distance distribution was calculated from the experimental structure, and subsequently used to predict the structures of 11 data-assisted targets in the CASP12 experiment. Major improvement of the GDT_TS was obtained for 2 targets, minor improvement for other 2 while, for 6 target GDT_TS deteriorated compared with that calculated for predictions without the SAXS data, partly because of assuming a wrong multimeric state (for Ts866) or because the crystal conformation was more compact than the solution conformation (for Ts942). Particularly good results were obtained for Ts909, in which use of SAXS data resulted in the selection of a correctly packed trimer and, subsequently, increased the GDT_TS of monomer prediction. It was found that running simulations with correct oligomeric state is essential for the success in SAXS-data-assisted prediction. © 2017 Wiley Periodicals, Inc.

  16. Structures composing protein domains.

    Science.gov (United States)

    Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel; Hudeček, Jiří

    2013-08-01

    This review summarizes available data concerning intradomain structures (IS) such as functionally important amino acid residues, short linear motifs, conserved or disordered regions, peptide repeats, broadly occurring secondary structures or folds, etc. IS form structural features (units or elements) necessary for interactions with proteins or non-peptidic ligands, enzyme reactions and some structural properties of proteins. These features have often been related to a single structural level (e.g. primary structure) mostly requiring certain structural context of other levels (e.g. secondary structures or supersecondary folds) as follows also from some examples reported or demonstrated here. In addition, we deal with some functionally important dynamic properties of IS (e.g. flexibility and different forms of accessibility), and more special dynamic changes of IS during enzyme reactions and allosteric regulation. Selected notes concern also some experimental methods, still more necessary tools of bioinformatic processing and clinically interesting relationships. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  17. Oligomeric protein structure networks: insights into protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Brinda KV

    2005-12-01

    Full Text Available Abstract Background Protein-protein association is essential for a variety of cellular processes and hence a large number of investigations are being carried out to understand the principles of protein-protein interactions. In this study, oligomeric protein structures are viewed from a network perspective to obtain new insights into protein association. Structure graphs of proteins have been constructed from a non-redundant set of protein oligomer crystal structures by considering amino acid residues as nodes and the edges are based on the strength of the non-covalent interactions between the residues. The analysis of such networks has been carried out in terms of amino acid clusters and hubs (highly connected residues with special emphasis to protein interfaces. Results A variety of interactions such as hydrogen bond, salt bridges, aromatic and hydrophobic interactions, which occur at the interfaces are identified in a consolidated manner as amino acid clusters at the interface, from this study. Moreover, the characterization of the highly connected hub-forming residues at the interfaces and their comparison with the hubs from the non-interface regions and the non-hubs in the interface regions show that there is a predominance of charged interactions at the interfaces. Further, strong and weak interfaces are identified on the basis of the interaction strength between amino acid residues and the sizes of the interface clusters, which also show that many protein interfaces are stronger than their monomeric protein cores. The interface strengths evaluated based on the interface clusters and hubs also correlate well with experimentally determined dissociation constants for known complexes. Finally, the interface hubs identified using the present method correlate very well with experimentally determined hotspots in the interfaces of protein complexes obtained from the Alanine Scanning Energetics database (ASEdb. A few predictions of interface hot

  18. Avian reovirus L2 genome segment sequences and predicted structure/function of the encoded RNA-dependent RNA polymerase protein

    Directory of Open Access Journals (Sweden)

    Xu Wanhong

    2008-12-01

    Full Text Available Abstract Background The orthoreoviruses are infectious agents that possess a genome comprised of 10 double-stranded RNA segments encased in two concentric protein capsids. Like virtually all RNA viruses, an RNA-dependent RNA polymerase (RdRp enzyme is required for viral propagation. RdRp sequences have been determined for the prototype mammalian orthoreoviruses and for several other closely-related reoviruses, including aquareoviruses, but have not yet been reported for any avian orthoreoviruses. Results We determined the L2 genome segment nucleotide sequences, which encode the RdRp proteins, of two different avian reoviruses, strains ARV138 and ARV176 in order to define conserved and variable regions within reovirus RdRp proteins and to better delineate structure/function of this important enzyme. The ARV138 L2 genome segment was 3829 base pairs long, whereas the ARV176 L2 segment was 3830 nucleotides long. Both segments were predicted to encode λB RdRp proteins 1259 amino acids in length. Alignments of these newly-determined ARV genome segments, and their corresponding proteins, were performed with all currently available homologous mammalian reovirus (MRV and aquareovirus (AqRV genome segment and protein sequences. There was ~55% amino acid identity between ARV λB and MRV λ3 proteins, making the RdRp protein the most highly conserved of currently known orthoreovirus proteins, and there was ~28% identity between ARV λB and homologous MRV and AqRV RdRp proteins. Predictive structure/function mapping of identical and conserved residues within the known MRV λ3 atomic structure indicated most identical amino acids and conservative substitutions were located near and within predicted catalytic domains and lining RdRp channels, whereas non-identical amino acids were generally located on the molecule's surfaces. Conclusion The ARV λB and MRV λ3 proteins showed the highest ARV:MRV identity values (~55% amongst all currently known ARV and MRV

  19. 3D structure prediction of histone acetyltransferase (HAC proteins of the p300/CBP family and their interactome in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Amar Cemanovic

    2014-09-01

    Full Text Available Histone acetylation is an important posttranslational modification correlated with gene activation. In Arabidopsis thaliana the histone acetyltransferase (HAC proteins of the CBP family are homologous to animal p300/CREB (cAMP-responsive element-binding proteins, which are important histone acetyltransferases participating in many physiological processes, including proliferation, differentiation, and apoptosis. In this study the 3-D structure of all HAC protein subunits in Arabidopsis thaliana: HAC1, HAC2, HAC4, HAC5 and HAC12 is predicted by homology modeling and confirmed by Ramachandran plot analysis. The amino acid sequences HAC family members are highly similar to the sequences of the homologous human p300/CREB protein. Conservation of p300/CBP domains among the HAC proteins was examined further by sequence alignment and pattern search. The domains of p300/CBP required for the HAC function, such as PHD, TAZ and ZZ domains, are conserved in all HAC proteins. Interactome analysis revealed that HAC1, HAC5 and HAC12 proteins interact with S-adenosylmethionine-dependent methyltransferase domaincontaining protein that shows methyltransferase activity, suggesting an additional function of the HAC proteins. Additionally, HAC5 has a strong interaction value for the putative c-myb-like transcription factor MYB3R-4, which suggests that it also may have a function in regulation of DNA replication.

  20. A Novel Method Using Abstract Convex Underestimation in Ab-Initio Protein Structure Prediction for Guiding Search in Conformational Feature Space.

    Science.gov (United States)

    Hao, Xiao-Hu; Zhang, Gui-Jun; Zhou, Xiao-Gen; Yu, Xu-Feng

    2016-01-01

    To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more

  1. Different protein-protein interface patterns predicted by different machine learning methods.

    Science.gov (United States)

    Wang, Wei; Yang, Yongxiao; Yin, Jianxin; Gong, Xinqi

    2017-11-22

    Different types of protein-protein interactions make different protein-protein interface patterns. Different machine learning methods are suitable to deal with different types of data. Then, is it the same situation that different interface patterns are preferred for prediction by different machine learning methods? Here, four different machine learning methods were employed to predict protein-protein interface residue pairs on different interface patterns. The performances of the methods for different types of proteins are different, which suggest that different machine learning methods tend to predict different protein-protein interface patterns. We made use of ANOVA and variable selection to prove our result. Our proposed methods taking advantages of different single methods also got a good prediction result compared to single methods. In addition to the prediction of protein-protein interactions, this idea can be extended to other research areas such as protein structure prediction and design.

  2. Virtual screening of natural inhibitors to the predicted HBx protein structure of Hepatitis B Virus using molecular docking for identification of potential lead molecules for liver cancer

    Science.gov (United States)

    Pathak, Rajesh Kumar; Baunthiyal, Mamta; Taj, Gohar; Kumar, Anil

    2014-01-01

    The HBx protein in Hepatitis B Virus (HBV) is a potential target for anti-liver cancer molecules. Therefore, it is of interest to screen known natural compounds against the HBx protein using molecular docking. However, the structure of HBx is not yet known. Therefore, the predicted structure of HBx using threading in LOMET was used for docking against plant derived natural compounds (curcumin, oleanolic acid, resveratrol, bilobetin, luteoline, ellagic acid, betulinic acid and rutin) by Molegro Virtual Docker. The screening identified rutin with binding energy of -161.65 Kcal/mol. Thus, twenty derivatives of rutin were further designed and screened against HBx. These in silico experiments identified compounds rutin01 (-163.16 Kcal/mol) and rutin08 (- 165.76 Kcal/mol) for further consideration and downstream validation. PMID:25187683

  3. Predictive and comparative analysis of Ebolavirus proteins

    Science.gov (United States)

    Cong, Qian; Pei, Jimin; Grishin, Nick V

    2015-01-01

    Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus. PMID:26158395

  4. Predictive and comparative analysis of Ebolavirus proteins.

    Science.gov (United States)

    Cong, Qian; Pei, Jimin; Grishin, Nick V

    2015-01-01

    Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus.

  5. Three-dimensional (3D) structure prediction and function analysis of the chitin-binding domain 3 protein HD73_3189 from Bacillus thuringiensis HD73.

    Science.gov (United States)

    Zhan, Yiling; Guo, Shuyuan

    2015-01-01

    Bacillus thuringiensis (Bt) is capable of producing a chitin-binding protein believed to be functionally important to bacteria during the stationary phase of its growth cycle. In this paper, the chitin-binding domain 3 protein HD73_3189 from B. thuringiensis has been analyzed by computer technology. Primary and secondary structural analyses demonstrated that HD73_3189 is negatively charged and contains several α-helices, aperiodical coils and β-strands. Domain and motif analyses revealed that HD73_3189 contains a signal peptide, an N-terminal chitin binding 3 domains, two copies of a fibronectin-like domain 3 and a C-terminal carbohydrate binding domain classified as CBM_5_12. Moreover, analysis predicted the protein's associated localization site to be the cell wall. Ligand site prediction determined that amino acid residues GLU-312, TRP-334, ILE-341 and VAL-382 exposed on the surface of the target protein exhibit polar interactions with the substrate.

  6. Construction of ontology augmented networks for protein complex prediction.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  7. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

    Science.gov (United States)

    Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.

  8. Structure and non-structure of centrosomal proteins.

    Science.gov (United States)

    Dos Santos, Helena G; Abia, David; Janowski, Robert; Mortuza, Gulnahar; Bertero, Michela G; Boutin, Maïlys; Guarín, Nayibe; Méndez-Giraldez, Raúl; Nuñez, Alfonso; Pedrero, Juan G; Redondo, Pilar; Sanz, María; Speroni, Silvia; Teichert, Florian; Bruix, Marta; Carazo, José M; Gonzalez, Cayetano; Reina, José; Valpuesta, José M; Vernos, Isabelle; Zabala, Juan C; Montoya, Guillermo; Coll, Miquel; Bastolla, Ugo; Serrano, Luis

    2013-01-01

    Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.

  9. Compare local pocket and global protein structure models by small structure patterns

    KAUST Repository

    Cui, Xuefeng; Kuwahara, Hiroyuki; Li, Shuai Cheng; Gao, Xin

    2015-01-01

    Researchers proposed several criteria to assess the quality of predicted protein structures because it is one of the essential tasks in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competitions. Popular criteria

  10. PSAIA – Protein Structure and Interaction Analyzer

    Directory of Open Access Journals (Sweden)

    Vlahoviček Kristian

    2008-04-01

    Full Text Available Abstract Background PSAIA (Protein Structure and Interaction Analyzer was developed to compute geometric parameters for large sets of protein structures in order to predict and investigate protein-protein interaction sites. Results In addition to most relevant established algorithms, PSAIA offers a new method PIADA (Protein Interaction Atom Distance Algorithm for the determination of residue interaction pairs. We found that PIADA produced more satisfactory results than comparable algorithms implemented in PSAIA. Particular advantages of PSAIA include its capacity to combine different methods to detect the locations and types of interactions between residues and its ability, without any further automation steps, to handle large numbers of protein structures and complexes. Generally, the integration of a variety of methods enables PSAIA to offer easier automation of analysis and greater reliability of results. PSAIA can be used either via a graphical user interface or from the command-line. Results are generated in either tabular or XML format. Conclusion In a straightforward fashion and for large sets of protein structures, PSAIA enables the calculation of protein geometric parameters and the determination of location and type for protein-protein interaction sites. XML formatted output enables easy conversion of results to various formats suitable for statistic analysis. Results from smaller data sets demonstrated the influence of geometry on protein interaction sites. Comprehensive analysis of properties of large data sets lead to new information useful in the prediction of protein-protein interaction sites.

  11. CNNcon: improved protein contact maps prediction using cascaded neural networks.

    Directory of Open Access Journals (Sweden)

    Wang Ding

    Full Text Available BACKGROUNDS: Despite continuing progress in X-ray crystallography and high-field NMR spectroscopy for determination of three-dimensional protein structures, the number of unsolved and newly discovered sequences grows much faster than that of determined structures. Protein modeling methods can possibly bridge this huge sequence-structure gap with the development of computational science. A grand challenging problem is to predict three-dimensional protein structure from its primary structure (residues sequence alone. However, predicting residue contact maps is a crucial and promising intermediate step towards final three-dimensional structure prediction. Better predictions of local and non-local contacts between residues can transform protein sequence alignment to structure alignment, which can finally improve template based three-dimensional protein structure predictors greatly. METHODS: CNNcon, an improved multiple neural networks based contact map predictor using six sub-networks and one final cascade-network, was developed in this paper. Both the sub-networks and the final cascade-network were trained and tested with their corresponding data sets. While for testing, the target protein was first coded and then input to its corresponding sub-networks for prediction. After that, the intermediate results were input to the cascade-network to finish the final prediction. RESULTS: The CNNcon can accurately predict 58.86% in average of contacts at a distance cutoff of 8 Å for proteins with lengths ranging from 51 to 450. The comparison results show that the present method performs better than the compared state-of-the-art predictors. Particularly, the prediction accuracy keeps steady with the increase of protein sequence length. It indicates that the CNNcon overcomes the thin density problem, with which other current predictors have trouble. This advantage makes the method valuable to the prediction of long length proteins. As a result, the effective

  12. Protein Structure Refinement by Optimization

    DEFF Research Database (Denmark)

    Carlsen, Martin

    on whether the three-dimensional structure of a homologous sequence is known. Whether or not a protein model can be used for industrial purposes depends on the quality of the predicted structure. A model can be used to design a drug when the quality is high. The overall goal of this project is to assess...... that correlates maximally to a native-decoy distance. The main contribution of this thesis is methods developed for analyzing the performance of metrically trained knowledge-based potentials and for optimizing their performance while making them less dependent on the decoy set used to define them. We focus...... being at-least a local minimum of the potential. To address how far the current functional form of the potential is from an ideal potential we present two methods for finding the optimal metrically trained potential that simultaneous has a number of native structures as a local minimum. Our results...

  13. Applications of contact predictions to structural biology

    Directory of Open Access Journals (Sweden)

    Felix Simkovic

    2017-05-01

    Full Text Available Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallography, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental

  14. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions.

    Science.gov (United States)

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-07-07

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale.

  15. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions

    Directory of Open Access Journals (Sweden)

    Xin Deng

    2015-07-01

    Full Text Available Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale.

  16. The interface of protein structure, protein biophysics, and molecular evolution

    Science.gov (United States)

    Liberles, David A; Teichmann, Sarah A; Bahar, Ivet; Bastolla, Ugo; Bloom, Jesse; Bornberg-Bauer, Erich; Colwell, Lucy J; de Koning, A P Jason; Dokholyan, Nikolay V; Echave, Julian; Elofsson, Arne; Gerloff, Dietlind L; Goldstein, Richard A; Grahnen, Johan A; Holder, Mark T; Lakner, Clemens; Lartillot, Nicholas; Lovell, Simon C; Naylor, Gavin; Perica, Tina; Pollock, David D; Pupko, Tal; Regan, Lynne; Roger, Andrew; Rubinstein, Nimrod; Shakhnovich, Eugene; Sjölander, Kimmen; Sunyaev, Shamil; Teufel, Ashley I; Thorne, Jeffrey L; Thornton, Joseph W; Weinreich, Daniel M; Whelan, Simon

    2012-01-01

    Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction. PMID:22528593

  17. The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction.

    Science.gov (United States)

    Li, Hongjian; Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J

    2018-03-14

    It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.

  18. Text mining improves prediction of protein functional sites.

    Directory of Open Access Journals (Sweden)

    Karin M Verspoor

    Full Text Available We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites. The structure analysis was carried out using Dynamics Perturbation Analysis (DPA, which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.

  19. Text Mining Improves Prediction of Protein Functional Sites

    Science.gov (United States)

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  20. Computational prediction of protein hot spot residues.

    Science.gov (United States)

    Morrow, John Kenneth; Zhang, Shuxing

    2012-01-01

    Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues.

  1. Structure Prediction and Analysis of Neuraminidase Sequence Variants

    Science.gov (United States)

    Thayer, Kelly M.

    2016-01-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the…

  2. Assessment of data-assisted prediction by inclusion of crosslinking/mass-spectrometry and small angle X-ray scattering data in the 12th Critical Assessment of protein Structure Prediction experiment.

    Science.gov (United States)

    Tamò, Giorgio E; Abriata, Luciano A; Fonti, Giulia; Dal Peraro, Matteo

    2018-03-01

    Integrative modeling approaches attempt to combine experiments and computation to derive structure-function relationships in complex molecular assemblies. Despite their importance for the advancement of life sciences, benchmarking of existing methodologies is rather poor. The 12 th round of the Critical Assessment of protein Structure Prediction (CASP) offered a unique niche to benchmark data and methods from two kinds of experiments often used in integrative modeling, namely residue-residue contacts obtained through crosslinking/mass-spectrometry (CLMS), and small-angle X-ray scattering (SAXS) experiments. Upon assessment of the models submitted by predictors for 3 targets assisted by CLMS data and 11 targets by SAXS data, we observed no significant improvement when compared to the best data-blind models, although most predictors did improve relative to their own data-blind predictions. Only for target Tx892 of the CLMS-assisted category and for target Ts947 of the SAXS-assisted category, there was a net, albeit mild, improvement relative to the best data-blind predictions. We discuss here possible reasons for the relatively poor success, which point rather to inconsistencies in the data sources rather than in the methods, to which a few groups were less sensitive. We conclude with suggestions that could improve the potential of data integration in future CASP rounds in terms of experimental data production, methods development, data management and prediction assessment. © 2017 Wiley Periodicals, Inc.

  3. Refining intra-protein contact prediction by graph analysis

    Directory of Open Access Journals (Sweden)

    Eyal Eran

    2007-05-01

    Full Text Available Abstract Background Accurate prediction of intra-protein residue contacts from sequence information will allow the prediction of protein structures. Basic predictions of such specific contacts can be further refined by jointly analyzing predicted contacts, and by adding information on the relative positions of contacts in the protein primary sequence. Results We introduce a method for graph analysis refinement of intra-protein contacts, termed GARP. Our previously presented intra-contact prediction method by means of pair-to-pair substitution matrix (P2PConPred was used to test the GARP method. In our approach, the top contact predictions obtained by a basic prediction method were used as edges to create a weighted graph. The edges were scored by a mutual clustering coefficient that identifies highly connected graph regions, and by the density of edges between the sequence regions of the edge nodes. A test set of 57 proteins with known structures was used to determine contacts. GARP improves the accuracy of the P2PConPred basic prediction method in whole proteins from 12% to 18%. Conclusion Using a simple approach we increased the contact prediction accuracy of a basic method by 1.5 times. Our graph approach is simple to implement, can be used with various basic prediction methods, and can provide input for further downstream analyses.

  4. Protein Molecular Structures, Protein SubFractions, and Protein Availability Affected by Heat Processing: A Review

    International Nuclear Information System (INIS)

    Yu, P.

    2007-01-01

    The utilization and availability of protein depended on the types of protein and their specific susceptibility to enzymatic hydrolysis (inhibitory activities) in the gastrointestine and was highly associated with protein molecular structures. Studying internal protein structure and protein subfraction profiles leaded to an understanding of the components that make up a whole protein. An understanding of the molecular structure of the whole protein was often vital to understanding its digestive behavior and nutritive value in animals. In this review, recently obtained information on protein molecular structural effects of heat processing was reviewed, in relation to protein characteristics affecting digestive behavior and nutrient utilization and availability. The emphasis of this review was on (1) using the newly advanced synchrotron technology (S-FTIR) as a novel approach to reveal protein molecular chemistry affected by heat processing within intact plant tissues; (2) revealing the effects of heat processing on the profile changes of protein subfractions associated with digestive behaviors and kinetics manipulated by heat processing; (3) prediction of the changes of protein availability and supply after heat processing, using the advanced DVE/OEB and NRC-2001 models, and (4) obtaining information on optimal processing conditions of protein as intestinal protein source to achieve target values for potential high net absorbable protein in the small intestine. The information described in this article may give better insight in the mechanisms involved and the intrinsic protein molecular structural changes occurring upon processing.

  5. Blind testing cross-linking/mass spectrometry under the auspices of the 11th critical assessment of methods of protein structure prediction (CASP11 [version 1; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Adam Belsom

    2016-12-01

    Full Text Available Determining the structure of a protein by any method requires various contributions from experimental and computational sides. In a recent study, high-density cross-linking/mass spectrometry (HD-CLMS data in combination with ab initio structure prediction determined the structure of human serum albumin (HSA domains, with an RMSD to X-ray structure of up to 2.5 Å, or 3.4 Å in the context of blood serum. This paper reports the blind test on the readiness of this technology through the help of Critical Assessment of protein Structure Prediction (CASP. We identified between 201-381 unique residue pairs at an estimated 5% FDR (at link level albeit with missing site assignment precision evaluation, for four target proteins. HD-CLMS proved reliable once crystal structures were released. However, improvements in structure prediction using cross-link data were slight. We identified two reasons for this. Spread of cross-links along the protein sequence and the tightness of the spatial constraints must be improved. However, for the selected targets even ideal contact data derived from crystal structures did not allow modellers to arrive at the observed structure. Consequently, the progress of HD-CLMS in conjunction with computational modeling methods as a structure determination method, depends on advances on both arms of this hybrid approach.

  6. Modeling protein structures: construction and their applications.

    Science.gov (United States)

    Ring, C S; Cohen, F E

    1993-06-01

    Although no general solution to the protein folding problem exists, the three-dimensional structures of proteins are being successfully predicted when experimentally derived constraints are used in conjunction with heuristic methods. In the case of interleukin-4, mutagenesis data and CD spectroscopy were instrumental in the accurate assignment of secondary structure. In addition, the tertiary structure was highly constrained by six cysteines separated by many residues that formed three disulfide bridges. Although the correct structure was a member of a short list of plausible structures, the "best" structure was the topological enantiomer of the experimentally determined conformation. For many proteases, other experimentally derived structures can be used as templates to identify the secondary structure elements. In a procedure called modeling by homology, the structure of a known protein is used as a scaffold to predict the structure of another related protein. This method has been used to model a serine and a cysteine protease that are important in the schistosome and malarial life cycles, respectively. The model structures were then used to identify putative small molecule enzyme inhibitors computationally. Experiments confirm that some of these nonpeptidic compounds are active at concentrations of less than 10 microM.

  7. Understanding Protein-Protein Interactions Using Local Structural Features

    DEFF Research Database (Denmark)

    Planas-Iglesias, Joan; Bonet, Jaume; García-García, Javier

    2013-01-01

    Protein-protein interactions (PPIs) play a relevant role among the different functions of a cell. Identifying the PPI network of a given organism (interactome) is useful to shed light on the key molecular mechanisms within a biological system. In this work, we show the role of structural features...... interacting and non-interacting protein pairs to classify the structural features that sustain the binding (or non-binding) behavior. Our study indicates that not only the interacting region but also the rest of the protein surface are important for the interaction fate. The interpretation...... to score the likelihood of the interaction between two proteins and to develop a method for the prediction of PPIs. We have tested our method on several sets with unbalanced ratios of interactions and non-interactions to simulate real conditions, obtaining accuracies higher than 25% in the most unfavorable...

  8. Topology of membrane proteins-predictions, limitations and variations.

    Science.gov (United States)

    Tsirigos, Konstantinos D; Govindarajan, Sudha; Bassot, Claudio; Västermark, Åke; Lamb, John; Shu, Nanjiang; Elofsson, Arne

    2017-10-26

    Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    Directory of Open Access Journals (Sweden)

    Dobbs Drena

    2011-06-01

    Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of

  10. Diverse effects of distance cutoff and residue interval on the performance of distance-dependent atom-pair potential in protein structure prediction.

    Science.gov (United States)

    Yao, Yuangen; Gui, Rong; Liu, Quan; Yi, Ming; Deng, Haiyou

    2017-12-08

    As one of the most successful knowledge-based energy functions, the distance-dependent atom-pair potential is widely used in all aspects of protein structure prediction, including conformational search, model refinement, and model assessment. During the last two decades, great efforts have been made to improve the reference state of the potential, while other factors that also strongly affect the performance of the potential have been relatively less investigated. Based on different distance cutoffs (from 5 to 22 Å) and residue intervals (from 0 to 15) as well as six different reference states, we constructed a series of distance-dependent atom-pair potentials and tested them on several groups of structural decoy sets collected from diverse sources. A comprehensive investigation has been performed to clarify the effects of distance cutoff and residue interval on the potential's performance. Our results provide a new perspective as well as a practical guidance for optimizing distance-dependent statistical potentials. The optimal distance cutoff and residue interval are highly related with the reference state that the potential is based on, the measurements of the potential's performance, and the decoy sets that the potential is applied to. The performance of distance-dependent statistical potential can be significantly improved when the best statistical parameters for the specific application environment are adopted.

  11. Prediction of molecular crystal structures

    International Nuclear Information System (INIS)

    Beyer, Theresa

    2001-01-01

    The ab initio prediction of molecular crystal structures is a scientific challenge. Reliability of first-principle prediction calculations would show a fundamental understanding of crystallisation. Crystal structure prediction is also of considerable practical importance as different crystalline arrangements of the same molecule in the solid state (polymorphs)are likely to have different physical properties. A method of crystal structure prediction based on lattice energy minimisation has been developed in this work. The choice of the intermolecular potential and of the molecular model is crucial for the results of such studies and both of these criteria have been investigated. An empirical atom-atom repulsion-dispersion potential for carboxylic acids has been derived and applied in a crystal structure prediction study of formic, benzoic and the polymorphic system of tetrolic acid. As many experimental crystal structure determinations at different temperatures are available for the polymorphic system of paracetamol (acetaminophen), the influence of the variations of the molecular model on the crystal structure lattice energy minima, has also been studied. The general problem of prediction methods based on the assumption that the experimental thermodynamically stable polymorph corresponds to the global lattice energy minimum, is that more hypothetical low lattice energy structures are found within a few kJ mol -1 of the global minimum than are likely to be experimentally observed polymorphs. This is illustrated by the results for molecule I, 3-oxabicyclo(3.2.0)hepta-1,4-diene, studied for the first international blindtest for small organic crystal structures organised by the Cambridge Crystallographic Data Centre (CCDC) in May 1999. To reduce the number of predicted polymorphs, additional factors to thermodynamic criteria have to be considered. Therefore the elastic constants and vapour growth morphologies have been calculated for the lowest lattice energy

  12. Prediction of molecular crystal structures

    Energy Technology Data Exchange (ETDEWEB)

    Beyer, Theresa

    2001-07-01

    The ab initio prediction of molecular crystal structures is a scientific challenge. Reliability of first-principle prediction calculations would show a fundamental understanding of crystallisation. Crystal structure prediction is also of considerable practical importance as different crystalline arrangements of the same molecule in the solid state (polymorphs)are likely to have different physical properties. A method of crystal structure prediction based on lattice energy minimisation has been developed in this work. The choice of the intermolecular potential and of the molecular model is crucial for the results of such studies and both of these criteria have been investigated. An empirical atom-atom repulsion-dispersion potential for carboxylic acids has been derived and applied in a crystal structure prediction study of formic, benzoic and the polymorphic system of tetrolic acid. As many experimental crystal structure determinations at different temperatures are available for the polymorphic system of paracetamol (acetaminophen), the influence of the variations of the molecular model on the crystal structure lattice energy minima, has also been studied. The general problem of prediction methods based on the assumption that the experimental thermodynamically stable polymorph corresponds to the global lattice energy minimum, is that more hypothetical low lattice energy structures are found within a few kJ mol{sup -1} of the global minimum than are likely to be experimentally observed polymorphs. This is illustrated by the results for molecule I, 3-oxabicyclo(3.2.0)hepta-1,4-diene, studied for the first international blindtest for small organic crystal structures organised by the Cambridge Crystallographic Data Centre (CCDC) in May 1999. To reduce the number of predicted polymorphs, additional factors to thermodynamic criteria have to be considered. Therefore the elastic constants and vapour growth morphologies have been calculated for the lowest lattice energy

  13. Information assessment on predicting protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Gerstein Mark

    2004-10-01

    Full Text Available Abstract Background Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information. Results Our assessment is based on the genomic features used in a Bayesian network approach to predict protein-protein interactions genome-wide in yeast. In the special case, when one does not have any missing information about any of the features, our analysis shows that there is a larger information contribution from the functional-classification than from expression correlations or essentiality. We also show that in this case alternative models, such as logistic regression and random forest, may be more effective than Bayesian networks for predicting interactions. Conclusions In the restricted problem posed by the complete-information subset, we identified that the MIPS and Gene Ontology (GO functional similarity datasets as the dominating information contributors for predicting the protein-protein interactions under the framework proposed by Jansen et al. Random forests based on the MIPS and GO information alone can give highly accurate classifications. In this particular subset of complete information, adding other genomic data does little for improving predictions. We also found that the data discretizations used in the

  14. Fast loop modeling for protein structures

    Science.gov (United States)

    Zhang, Jiong; Nguyen, Son; Shang, Yi; Xu, Dong; Kosztin, Ioan

    2015-03-01

    X-ray crystallography is the main method for determining 3D protein structures. In many cases, however, flexible loop regions of proteins cannot be resolved by this approach. This leads to incomplete structures in the protein data bank, preventing further computational study and analysis of these proteins. For instance, all-atom molecular dynamics (MD) simulation studies of structure-function relationship require complete protein structures. To address this shortcoming, we have developed and implemented an efficient computational method for building missing protein loops. The method is database driven and uses deep learning and multi-dimensional scaling algorithms. We have implemented the method as a simple stand-alone program, which can also be used as a plugin in existing molecular modeling software, e.g., VMD. The quality and stability of the generated structures are assessed and tested via energy scoring functions and by equilibrium MD simulations. The proposed method can also be used in template-based protein structure prediction. Work supported by the National Institutes of Health [R01 GM100701]. Computer time was provided by the University of Missouri Bioinformatics Consortium.

  15. Protein structure database search and evolutionary classification.

    Science.gov (United States)

    Yang, Jinn-Moon; Tung, Chi-Hua

    2006-01-01

    As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].

  16. Predicting turns in proteins with a unified model.

    Directory of Open Access Journals (Sweden)

    Qi Song

    Full Text Available MOTIVATION: Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. RESULTS: In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i using newly exploited features of structural evolution information (secondary structure and shape string of protein based on structure homologies, (ii considering all types of turns in a unified model, and (iii practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.

  17. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    Science.gov (United States)

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Protein function prediction involved on radio-resistant bacteria

    International Nuclear Information System (INIS)

    Mezhoud, Karim; Mankai, Houda; Sghaier, Haitham; Barkallah, Insaf

    2009-01-01

    Previously, we identified 58 proteins under positive selection in ionizing-radiation-resistant bacteria (IRRB) but absent in all ionizing-radiation-sensitive bacteria (IRSB). These are good reasons to believe these 58 proteins with their interactions with other proteins (interactomes) are a part of the answer to the question as to how IRRB resist to radiation, because our knowledge of interactomes of positively selected orphan proteins in IRRB might allow us to define cellular pathways important to ionizing-radiation resistance. Using the Database of Interacting Proteins and the PSIbase, we have predicted interactions of orthologs of the 58 proteins under positive selection in IRRB but absent in all IRSB. We used integrate experimental data sets with molecular interaction networks and protein structure prediction from databases. Among these, 18 proteins with their interactomes were identified in Deinococcus radiodurans R1. DNA checkpoint and repair, kinases pathways, energetic and nucleotide metabolisms were the important biological process that found. We predicted the interactomes of 58 proteins under positive selection in IRRB. It is hoped our data will provide new clues as to the cellular pathways that are important for ionizing-radiation resistance. We have identified news proteins involved on DNA management which were not previously mentioned. It is an important input in addition to protein that studied. It does still work to deepen our study on these new proteins

  19. Efficient protein structure search using indexing methods.

    Science.gov (United States)

    Kim, Sungchul; Sael, Lee; Yu, Hwanjo

    2013-01-01

    Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.

  20. Proteins with Novel Structure, Function and Dynamics

    Science.gov (United States)

    Pohorille, Andrew

    2014-01-01

    Recently, a small enzyme that ligates two RNA fragments with the rate of 10(exp 6) above background was evolved in vitro (Seelig and Szostak, Nature 448:828-831, 2007). This enzyme does not resemble any contemporary protein (Chao et al., Nature Chem. Biol. 9:81-83, 2013). It consists of a dynamic, catalytic loop, a small, rigid core containing two zinc ions coordinated by neighboring amino acids, and two highly flexible tails that might be unimportant for protein function. In contrast to other proteins, this enzyme does not contain ordered secondary structure elements, such as alpha-helix or beta-sheet. The loop is kept together by just two interactions of a charged residue and a histidine with a zinc ion, which they coordinate on the opposite side of the loop. Such structure appears to be very fragile. Surprisingly, computer simulations indicate otherwise. As the coordinating, charged residue is mutated to alanine, another, nearby charged residue takes its place, thus keeping the structure nearly intact. If this residue is also substituted by alanine a salt bridge involving two other, charged residues on the opposite sides of the loop keeps the loop in place. These adjustments are facilitated by high flexibility of the protein. Computational predictions have been confirmed experimentally, as both mutants retain full activity and overall structure. These results challenge our notions about what is required for protein activity and about the relationship between protein dynamics, stability and robustness. We hypothesize that small, highly dynamic proteins could be both active and fault tolerant in ways that many other proteins are not, i.e. they can adjust to retain their structure and activity even if subjected to mutations in structurally critical regions. This opens the doors for designing proteins with novel functions, structures and dynamics that have not been yet considered.

  1. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  2. Nonlinear deterministic structures and the randomness of protein sequences

    CERN Document Server

    Huang Yan Zhao

    2003-01-01

    To clarify the randomness of protein sequences, we make a detailed analysis of a set of typical protein sequences representing each structural classes by using nonlinear prediction method. No deterministic structures are found in these protein sequences and this implies that they behave as random sequences. We also give an explanation to the controversial results obtained in previous investigations.

  3. From nonspecific DNA-protein encounter complexes to the prediction of DNA-protein interactions.

    Directory of Open Access Journals (Sweden)

    Mu Gao

    2009-03-01

    Full Text Available DNA-protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA-protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA-protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA-protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA-protein interaction modes exhibit some similarity to specific DNA-protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Calpha deviation from native is up to 5 A from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA-protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein.

  4. Deep learning methods for protein torsion angle prediction.

    Science.gov (United States)

    Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin

    2017-09-18

    Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.

  5. Automated protein structure modeling with SWISS-MODEL Workspace and the Protein Model Portal.

    Science.gov (United States)

    Bordoli, Lorenza; Schwede, Torsten

    2012-01-01

    Comparative protein structure modeling is a computational approach to build three-dimensional structural models for proteins using experimental structures of related protein family members as templates. Regular blind assessments of modeling accuracy have demonstrated that comparative protein structure modeling is currently the most reliable technique to model protein structures. Homology models are often sufficiently accurate to substitute for experimental structures in a wide variety of applications. Since the usefulness of a model for specific application is determined by its accuracy, model quality estimation is an essential component of protein structure prediction. Comparative protein modeling has become a routine approach in many areas of life science research since fully automated modeling systems allow also nonexperts to build reliable models. In this chapter, we describe practical approaches for automated protein structure modeling with SWISS-MODEL Workspace and the Protein Model Portal.

  6. Prediction of Protein Thermostability by an Efficient Neural Network Approach

    Directory of Open Access Journals (Sweden)

    Jalal Rezaeenour

    2016-10-01

    Full Text Available Introduction: Manipulation of protein stability is important for understanding the principles that govern protein thermostability, both in basic research and industrial applications. Various data mining techniques exist for prediction of thermostable proteins. Furthermore, ANN methods have attracted significant attention for prediction of thermostability, because they constitute an appropriate approach to mapping the non-linear input-output relationships and massive parallel computing. Method: An Extreme Learning Machine (ELM was applied to estimate thermal behavior of 1289 proteins. In the proposed algorithm, the parameters of ELM were optimized using a Genetic Algorithm (GA, which tuned a set of input variables, hidden layer biases, and input weights, to and enhance the prediction performance. The method was executed on a set of amino acids, yielding a total of 613 protein features. A number of feature selection algorithms were used to build subsets of the features. A total of 1289 protein samples and 613 protein features were calculated from UniProt database to understand features contributing to the enzymes’ thermostability and find out the main features that influence this valuable characteristic. Results:At the primary structure level, Gln, Glu and polar were the features that mostly contributed to protein thermostability. At the secondary structure level, Helix_S, Coil, and charged_Coil were the most important features affecting protein thermostability. These results suggest that the thermostability of proteins is mainly associated with primary structural features of the protein. According to the results, the influence of primary structure on the thermostabilty of a protein was more important than that of the secondary structure. It is shown that prediction accuracy of ELM (mean square error can improve dramatically using GA with error rates RMSE=0.004 and MAPE=0.1003. Conclusion: The proposed approach for forecasting problem

  7. PRODIGY : a web server for predicting the binding affinity of protein-protein complexes

    NARCIS (Netherlands)

    Xue, Li; Garcia Lopes Maia Rodrigues, João; Kastritis, Panagiotis L; Bonvin, Alexandre Mjj; Vangone, Anna

    2016-01-01

    Gaining insights into the structural determinants of protein-protein interactions holds the key for a deeper understanding of biological functions, diseases and development of therapeutics. An important aspect of this is the ability to accurately predict the binding strength for a given

  8. Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method

    Energy Technology Data Exchange (ETDEWEB)

    Cao, Haibo [Iowa State Univ., Ames, IA (United States)

    2003-01-01

    In this work, they try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. They found a strong correlation between amino acid sequences and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, they give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part includes discussions of interactions among amino acids residues, lattice HP model, and the design ability principle. In the second part, they try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in the eigenvector study of protein contact matrix. They believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, they discuss a threading method based on the correlation between amino acid sequences and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, they list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.

  9. Protein structure recognition: From eigenvector analysis to structural threading method

    Science.gov (United States)

    Cao, Haibo

    In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.

  10. Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method

    International Nuclear Information System (INIS)

    Haibo Cao

    2003-01-01

    In this work, they try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. They found a strong correlation between amino acid sequences and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, they give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part includes discussions of interactions among amino acids residues, lattice HP model, and the design ability principle. In the second part, they try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in the eigenvector study of protein contact matrix. They believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, they discuss a threading method based on the correlation between amino acid sequences and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, they list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches

  11. Human cancer protein-protein interaction network: a structural perspective.

    Directory of Open Access Journals (Sweden)

    Gozde Kar

    2009-12-01

    Full Text Available Protein-protein interaction networks provide a global picture of cellular function and biological processes. Some proteins act as hub proteins, highly connected to others, whereas some others have few interactions. The dysfunction of some interactions causes many diseases, including cancer. Proteins interact through their interfaces. Therefore, studying the interface properties of cancer-related proteins will help explain their role in the interaction networks. Similar or overlapping binding sites should be used repeatedly in single interface hub proteins, making them promiscuous. Alternatively, multi-interface hub proteins make use of several distinct binding sites to bind to different partners. We propose a methodology to integrate protein interfaces into cancer interaction networks (ciSPIN, cancer structural protein interface network. The interactions in the human protein interaction network are replaced by interfaces, coming from either known or predicted complexes. We provide a detailed analysis of cancer related human protein-protein interfaces and the topological properties of the cancer network. The results reveal that cancer-related proteins have smaller, more planar, more charged and less hydrophobic binding sites than non-cancer proteins, which may indicate low affinity and high specificity of the cancer-related interactions. We also classified the genes in ciSPIN according to phenotypes. Within phenotypes, for breast cancer, colorectal cancer and leukemia, interface properties were found to be discriminating from non-cancer interfaces with an accuracy of 71%, 67%, 61%, respectively. In addition, cancer-related proteins tend to interact with their partners through distinct interfaces, corresponding mostly to multi-interface hubs, which comprise 56% of cancer-related proteins, and constituting the nodes with higher essentiality in the network (76%. We illustrate the interface related affinity properties of two cancer-related hub

  12. Annotating the protein-RNA interaction sites in proteins using evolutionary information and protein backbone structure.

    Science.gov (United States)

    Li, Tao; Li, Qian-Zhong

    2012-11-07

    RNA-protein interactions play important roles in various biological processes. The precise detection of RNA-protein interaction sites is very important for understanding essential biological processes and annotating the function of the proteins. In this study, based on various features from amino acid sequence and structure, including evolutionary information, solvent accessible surface area and torsion angles (φ, ψ) in the backbone structure of the polypeptide chain, a computational method for predicting RNA-binding sites in proteins is proposed. When the method is applied to predict RNA-binding sites in three datasets: RBP86 containing 86 protein chains, RBP107 containing 107 proteins chains and RBP109 containing 109 proteins chains, better sensitivities and specificities are obtained compared to previously published methods in five-fold cross-validation tests. In order to make further examination for the efficiency of our method, the RBP107 dataset is used as training set, RBP86 and RBP109 datasets are used as the independent test sets. In addition, as examples of our prediction, RNA-binding sites in a few proteins are presented. The annotated results are consistent with the PDB annotation. These results show that our method is useful for annotating RNA binding sites of novel proteins.

  13. The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis

    Science.gov (United States)

    Suzuki, Tomonori; Miyazaki, Satoru

    2011-01-01

    Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.

  14. Protein interfacial structure and nanotoxicology

    International Nuclear Information System (INIS)

    White, John W.; Perriman, Adam W.; McGillivray, Duncan J.; Lin, J.-M.

    2009-01-01

    Here we briefly recapitulate the use of X-ray and neutron reflectometry at the air-water interface to find protein structures and thermodynamics at interfaces and test a possibility for understanding those interactions between nanoparticles and proteins which lead to nanoparticle toxicology through entry into living cells. Stable monomolecular protein films have been made at the air-water interface and, with a specially designed vessel, the substrate changed from that which the air-water interfacial film was deposited. This procedure allows interactions, both chemical and physical, between introduced species and the monomolecular film to be studied by reflectometry. The method is briefly illustrated here with some new results on protein-protein interaction between β-casein and κ-casein at the air-water interface using X-rays. These two proteins are an essential component of the structure of milk. In the experiments reported, specific and directional interactions appear to cause different interfacial structures if first, a β-casein monolayer is attacked by a κ-casein solution compared to the reverse. The additional contrast associated with neutrons will be an advantage here. We then show the first results of experiments on the interaction of a β-casein monolayer with a nanoparticle titanium oxide sol, foreshadowing the study of the nanoparticle 'corona' thought to be important for nanoparticle-cell wall penetration.

  15. Protein interfacial structure and nanotoxicology

    Energy Technology Data Exchange (ETDEWEB)

    White, John W. [Research School of Chemistry, Australian National University, Canberra (Australia)], E-mail: jww@rsc.anu.edu.au; Perriman, Adam W.; McGillivray, Duncan J.; Lin, J.-M. [Research School of Chemistry, Australian National University, Canberra (Australia)

    2009-02-21

    Here we briefly recapitulate the use of X-ray and neutron reflectometry at the air-water interface to find protein structures and thermodynamics at interfaces and test a possibility for understanding those interactions between nanoparticles and proteins which lead to nanoparticle toxicology through entry into living cells. Stable monomolecular protein films have been made at the air-water interface and, with a specially designed vessel, the substrate changed from that which the air-water interfacial film was deposited. This procedure allows interactions, both chemical and physical, between introduced species and the monomolecular film to be studied by reflectometry. The method is briefly illustrated here with some new results on protein-protein interaction between {beta}-casein and {kappa}-casein at the air-water interface using X-rays. These two proteins are an essential component of the structure of milk. In the experiments reported, specific and directional interactions appear to cause different interfacial structures if first, a {beta}-casein monolayer is attacked by a {kappa}-casein solution compared to the reverse. The additional contrast associated with neutrons will be an advantage here. We then show the first results of experiments on the interaction of a {beta}-casein monolayer with a nanoparticle titanium oxide sol, foreshadowing the study of the nanoparticle 'corona' thought to be important for nanoparticle-cell wall penetration.

  16. Bioinformatic Prediction of WSSV-Host Protein-Protein Interaction

    Directory of Open Access Journals (Sweden)

    Zheng Sun

    2014-01-01

    Full Text Available WSSV is one of the most dangerous pathogens in shrimp aquaculture. However, the molecular mechanism of how WSSV interacts with shrimp is still not very clear. In the present study, bioinformatic approaches were used to predict interactions between proteins from WSSV and shrimp. The genome data of WSSV (NC_003225.1 and the constructed transcriptome data of F. chinensis were used to screen potentially interacting proteins by searching in protein interaction databases, including STRING, Reactome, and DIP. Forty-four pairs of proteins were suggested to have interactions between WSSV and the shrimp. Gene ontology analysis revealed that 6 pairs of these interacting proteins were classified into “extracellular region” or “receptor complex” GO-terms. KEGG pathway analysis showed that they were involved in the “ECM-receptor interaction pathway.” In the 6 pairs of interacting proteins, an envelope protein called “collagen-like protein” (WSSV-CLP encoded by an early virus gene “wsv001” in WSSV interacted with 6 deduced proteins from the shrimp, including three integrin alpha (ITGA, two integrin beta (ITGB, and one syndecan (SDC. Sequence analysis on WSSV-CLP, ITGA, ITGB, and SDC revealed that they possessed the sequence features for protein-protein interactions. This study might provide new insights into the interaction mechanisms between WSSV and shrimp.

  17. Prediction of Protein–Protein Interactions by Evidence Combining Methods

    Directory of Open Access Journals (Sweden)

    Ji-Wei Chang

    2016-11-01

    Full Text Available Most cellular functions involve proteins’ features based on their physical interactions with other partner proteins. Sketching a map of protein–protein interactions (PPIs is therefore an important inception step towards understanding the basics of cell functions. Several experimental techniques operating in vivo or in vitro have made significant contributions to screening a large number of protein interaction partners, especially high-throughput experimental methods. However, computational approaches for PPI predication supported by rapid accumulation of data generated from experimental techniques, 3D structure definitions, and genome sequencing have boosted the map sketching of PPIs. In this review, we shed light on in silico PPI prediction methods that integrate evidence from multiple sources, including evolutionary relationship, function annotation, sequence/structure features, network topology and text mining. These methods are developed for integration of multi-dimensional evidence, for designing the strategies to predict novel interactions, and for making the results consistent with the increase of prediction coverage and accuracy.

  18. Predicting protein-protein interactions from multimodal biological data sources via nonnegative matrix tri-factorization.

    Science.gov (United States)

    Wang, Hua; Huang, Heng; Ding, Chris; Nie, Feiping

    2013-04-01

    Protein interactions are central to all the biological processes and structural scaffolds in living organisms, because they orchestrate a number of cellular processes such as metabolic pathways and immunological recognition. Several high-throughput methods, for example, yeast two-hybrid system and mass spectrometry method, can help determine protein interactions, which, however, suffer from high false-positive rates. Moreover, many protein interactions predicted by one method are not supported by another. Therefore, computational methods are necessary and crucial to complete the interactome expeditiously. In this work, we formulate the problem of predicting protein interactions from a new mathematical perspective--sparse matrix completion, and propose a novel nonnegative matrix factorization (NMF)-based matrix completion approach to predict new protein interactions from existing protein interaction networks. Through using manifold regularization, we further develop our method to integrate different biological data sources, such as protein sequences, gene expressions, protein structure information, etc. Extensive experimental results on four species, Saccharomyces cerevisiae, Drosophila melanogaster, Homo sapiens, and Caenorhabditis elegans, have shown that our new methods outperform related state-of-the-art protein interaction prediction methods.

  19. Protein complex prediction via dense subgraphs and false positive analysis.

    Directory of Open Access Journals (Sweden)

    Cecilia Hernandez

    Full Text Available Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks, which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast and Homo sapiens (human with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows

  20. New tips for structure prediction by comparative modeling

    OpenAIRE

    Rayan, Anwar

    2009-01-01

    Comparative modelling is utilized to predict the 3-dimensional conformation of a given protein (target) based on its sequence alignment to experimentally determined protein structure (template). The use of such technique is already rewarding and increasingly widespread in biological research and drug development. The accuracy of the predictions as commonly accepted depends on the score of sequence identity of the target protein to the template. To assess the relationship between sequence iden...

  1. Prediction of Hydrophobic Cores of Proteins Using Wavelet Analysis.

    Science.gov (United States)

    Hirakawa; Kuhara

    1997-01-01

    Information concerning the secondary structures, flexibility, epitope and hydrophobic regions of amino acid sequences can be extracted by assigning physicochemical indices to each amino acid residue, and information on structure can be derived using the sliding window averaging technique, which is in wide use for smoothing out raw functions. Wavelet analysis has shown great potential and applicability in many fields, such as astronomy, radar, earthquake prediction, and signal or image processing. This approach is efficient for removing noise from various functions. Here we employed wavelet analysis to smooth out a plot assigned to a hydrophobicity index for amino acid sequences. We then used the resulting function to predict hydrophobic cores in globular proteins. We calculated the prediction accuracy for the hydrophobic cores of 88 representative set of proteins. Use of wavelet analysis made feasible the prediction of hydrophobic cores at 6.13% greater accuracy than the sliding window averaging technique.

  2. Structural entanglements in protein complexes

    Science.gov (United States)

    Zhao, Yani; Chwastyk, Mateusz; Cieplak, Marek

    2017-06-01

    We consider multi-chain protein native structures and propose a criterion that determines whether two chains in the system are entangled or not. The criterion is based on the behavior observed by pulling at both termini of each chain simultaneously in the two chains. We have identified about 900 entangled systems in the Protein Data Bank and provided a more detailed analysis for several of them. We argue that entanglement enhances the thermodynamic stability of the system but it may have other functions: burying the hydrophobic residues at the interface and increasing the DNA or RNA binding area. We also study the folding and stretching properties of the knotted dimeric proteins MJ0366, YibK, and bacteriophytochrome. These proteins have been studied theoretically in their monomeric versions so far. The dimers are seen to separate on stretching through the tensile mechanism and the characteristic unraveling force depends on the pulling direction.

  3. Exploiting protein flexibility to predict the location of allosteric sites

    Directory of Open Access Journals (Sweden)

    Panjkovich Alejandro

    2012-10-01

    Full Text Available Abstract Background Allostery is one of the most powerful and common ways of regulation of protein activity. However, for most allosteric proteins identified to date the mechanistic details of allosteric modulation are not yet well understood. Uncovering common mechanistic patterns underlying allostery would allow not only a better academic understanding of the phenomena, but it would also streamline the design of novel therapeutic solutions. This relatively unexplored therapeutic potential and the putative advantages of allosteric drugs over classical active-site inhibitors fuel the attention allosteric-drug research is receiving at present. A first step to harness the regulatory potential and versatility of allosteric sites, in the context of drug-discovery and design, would be to detect or predict their presence and location. In this article, we describe a simple computational approach, based on the effect allosteric ligands exert on protein flexibility upon binding, to predict the existence and position of allosteric sites on a given protein structure. Results By querying the literature and a recently available database of allosteric sites, we gathered 213 allosteric proteins with structural information that we further filtered into a non-redundant set of 91 proteins. We performed normal-mode analysis and observed significant changes in protein flexibility upon allosteric-ligand binding in 70% of the cases. These results agree with the current view that allosteric mechanisms are in many cases governed by changes in protein dynamics caused by ligand binding. Furthermore, we implemented an approach that achieves 65% positive predictive value in identifying allosteric sites within the set of predicted cavities of a protein (stricter parameters set, 0.22 sensitivity, by combining the current analysis on dynamics with previous results on structural conservation of allosteric sites. We also analyzed four biological examples in detail, revealing

  4. Predicting Secretory Proteins with SignalP

    DEFF Research Database (Denmark)

    Nielsen, Henrik

    2017-01-01

    SignalP is the currently most widely used program for prediction of signal peptides from amino acid sequences. Proteins with signal peptides are targeted to the secretory pathway, but are not necessarily secreted. After a brief introduction to the biology of signal peptides and the history...

  5. BLAST-based structural annotation of protein residues using Protein Data Bank.

    Science.gov (United States)

    Singh, Harinder; Raghava, Gajendra P S

    2016-01-25

    In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PDB) but it is difficult for a biologist to annotate protein residues from BLAST search. A web-server StarPDB has been developed for structural annotation of a protein based on its similarity with known protein structures. It uses standard BLAST software for performing similarity search of a query protein against protein structures in PDB. This server integrates wide range modules for assigning different types of annotation that includes, Secondary-structure, Accessible surface area, Tight-turns, DNA-RNA and Ligand modules. Secondary structure module allows users to predict regular secondary structure states to each residue in a protein. Accessible surface area predict the exposed or buried residues in a protein. Tight-turns module is designed to predict tight turns like beta-turns in a protein. DNA-RNA module developed for predicting DNA and RNA interacting residues in a protein. Similarly, Ligand module of server allows one to predicted ligands, metal and nucleotides ligand interacting residues in a protein. In summary, this manuscript presents a web server for comprehensive annotation of a protein based on similarity search. It integrates number of visualization tools that facilitate users to understand structure and function of protein residues. This web server is available freely for scientific community from URL http://crdd.osdd.net/raghava/starpdb .

  6. Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

    Directory of Open Access Journals (Sweden)

    Ching-Tai Chen

    Full Text Available Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins and were tested on an independent dataset (consisting of 142 proteins. The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted

  7. Protein-Protein Interaction Site Predictions with Three-Dimensional Probability Distributions of Interacting Atoms on Protein Surfaces

    Science.gov (United States)

    Chen, Ching-Tai; Peng, Hung-Pin; Jian, Jhih-Wei; Tsai, Keng-Chang; Chang, Jeng-Yih; Yang, Ei-Wen; Chen, Jun-Bo; Ho, Shinn-Ying; Hsu, Wen-Lian; Yang, An-Suei

    2012-01-01

    Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with

  8. Computational prediction of protein-protein interactions in Leishmania predicted proteomes.

    Directory of Open Access Journals (Sweden)

    Antonio M Rezende

    Full Text Available The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum are important human pathogens. Despite of years of study and genome availability, effective vaccine has not been developed yet, and the chemotherapy is highly toxic. Therefore, it is clear just interdisciplinary integrated studies will have success in trying to search new targets for developing of vaccines and drugs. An essential part of this rationale is related to protein-protein interaction network (PPI study which can provide a better understanding of complex protein interactions in biological system. Thus, we modeled PPIs for Trypanosomatids through computational methods using sequence comparison against public database of protein or domain interaction for interaction prediction (Interolog Mapping and developed a dedicated combined system score to address the predictions robustness. The confidence evaluation of network prediction approach was addressed using gold standard positive and negative datasets and the AUC value obtained was 0.94. As result, 39,420, 43,531 and 45,235 interactions were predicted for L. braziliensis, L. major and L. infantum respectively. For each predicted network the top 20 proteins were ranked by MCC topological index. In addition, information related with immunological potential, degree of protein sequence conservation among orthologs and degree of identity compared to proteins of potential parasite hosts was integrated. This information integration provides a better understanding and usefulness of the predicted networks that can be valuable to select new potential biological targets for drug and vaccine development. Network modularity which is a key when one is interested in destabilizing the PPIs for drug or vaccine purposes along with multiple alignments of the predicted PPIs were performed revealing patterns associated with protein turnover. In addition, around 50% of hypothetical protein present in the networks

  9. Soliton concepts and protein structure

    Science.gov (United States)

    Krokhotin, Andrei; Niemi, Antti J.; Peng, Xubiao

    2012-03-01

    Structural classification shows that the number of different protein folds is surprisingly small. It also appears that proteins are built in a modular fashion from a relatively small number of components. Here we propose that the modular building blocks are made of the dark soliton solution of a generalized discrete nonlinear Schrödinger equation. We find that practically all protein loops can be obtained simply by scaling the size and by joining together a number of copies of the soliton, one after another. The soliton has only two loop-specific parameters, and we compute their statistical distribution in the Protein Data Bank (PDB). We explicitly construct a collection of 200 sets of parameters, each determining a soliton profile that describes a different short loop. The ensuing profiles cover practically all those proteins in PDB that have a resolution which is better than 2.0 Å, with a precision such that the average root-mean-square distance between the loop and its soliton is less than the experimental B-factor fluctuation distance. We also present two examples that describe how the loop library can be employed both to model and to analyze folded proteins.

  10. A comprehensive comparison of comparative RNA structure prediction approaches

    DEFF Research Database (Denmark)

    Gardner, P. P.; Giegerich, R.

    2004-01-01

    -finding and multiple-sequence-alignment algorithms. Results Here we evaluate a number of RNA folding algorithms using reliable RNA data-sets and compare their relative performance. Conclusions We conclude that comparative data can enhance structure prediction but structure-prediction-algorithms vary widely in terms......Background An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene...

  11. Recognition of functional sites in protein structures.

    Science.gov (United States)

    Shulman-Peleg, Alexandra; Nussinov, Ruth; Wolfson, Haim J

    2004-06-04

    Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.

  12. Prediction of antigenic epitopes on protein surfaces by consensus scoring

    Directory of Open Access Journals (Sweden)

    Zhang Chi

    2009-09-01

    Full Text Available Abstract Background Prediction of antigenic epitopes on protein surfaces is important for vaccine design. Most existing epitope prediction methods focus on protein sequences to predict continuous epitopes linear in sequence. Only a few structure-based epitope prediction algorithms are available and they have not yet shown satisfying performance. Results We present a new antigen Epitope Prediction method, which uses ConsEnsus Scoring (EPCES from six different scoring functions - residue epitope propensity, conservation score, side-chain energy score, contact number, surface planarity score, and secondary structure composition. Applied to unbounded antigen structures from an independent test set, EPCES was able to predict antigenic eptitopes with 47.8% sensitivity, 69.5% specificity and an AUC value of 0.632. The performance of the method is statistically similar to other published methods. The AUC value of EPCES is slightly higher compared to the best results of existing algorithms by about 0.034. Conclusion Our work shows consensus scoring of multiple features has a better performance than any single term. The successful prediction is also due to the new score of residue epitope propensity based on atomic solvent accessibility.

  13. Predicting the binding patterns of hub proteins: a study using yeast protein interaction networks.

    Directory of Open Access Journals (Sweden)

    Carson M Andorf

    Full Text Available Protein-protein interactions are critical to elucidating the role played by individual proteins in important biological pathways. Of particular interest are hub proteins that can interact with large numbers of partners and often play essential roles in cellular control. Depending on the number of binding sites, protein hubs can be classified at a structural level as singlish-interface hubs (SIH with one or two binding sites, or multiple-interface hubs (MIH with three or more binding sites. In terms of kinetics, hub proteins can be classified as date hubs (i.e., interact with different partners at different times or locations or party hubs (i.e., simultaneously interact with multiple partners.Our approach works in 3 phases: Phase I classifies if a protein is likely to bind with another protein. Phase II determines if a protein-binding (PB protein is a hub. Phase III classifies PB proteins as singlish-interface versus multiple-interface hubs and date versus party hubs. At each stage, we use sequence-based predictors trained using several standard machine learning techniques.Our method is able to predict whether a protein is a protein-binding protein with an accuracy of 94% and a correlation coefficient of 0.87; identify hubs from non-hubs with 100% accuracy for 30% of the data; distinguish date hubs/party hubs with 69% accuracy and area under ROC curve of 0.68; and SIH/MIH with 89% accuracy and area under ROC curve of 0.84. Because our method is based on sequence information alone, it can be used even in settings where reliable protein-protein interaction data or structures of protein-protein complexes are unavailable to obtain useful insights into the functional and evolutionary characteristics of proteins and their interactions.We provide a web server for our three-phase approach: http://hybsvm.gdcb.iastate.edu.

  14. Plasma proteins predict conversion to dementia from prodromal disease.

    Science.gov (United States)

    Hye, Abdul; Riddoch-Contreras, Joanna; Baird, Alison L; Ashton, Nicholas J; Bazenet, Chantal; Leung, Rufina; Westman, Eric; Simmons, Andrew; Dobson, Richard; Sattlecker, Martina; Lupton, Michelle; Lunnon, Katie; Keohane, Aoife; Ward, Malcolm; Pike, Ian; Zucht, Hans Dieter; Pepin, Danielle; Zheng, Wei; Tunnicliffe, Alan; Richardson, Jill; Gauthier, Serge; Soininen, Hilkka; Kłoszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon

    2014-11-01

    The study aimed to validate previously discovered plasma biomarkers associated with AD, using a design based on imaging measures as surrogate for disease severity and assess their prognostic value in predicting conversion to dementia. Three multicenter cohorts of cognitively healthy elderly, mild cognitive impairment (MCI), and AD participants with standardized clinical assessments and structural neuroimaging measures were used. Twenty-six candidate proteins were quantified in 1148 subjects using multiplex (xMAP) assays. Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85%, and specificity 88%). We have identified 10 plasma proteins strongly associated with disease severity and disease progression. Such markers may be useful for patient selection for clinical trials and assessment of patients with predisease subjective memory complaints. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility.

    Science.gov (United States)

    Heffernan, Rhys; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi

    2017-09-15

    The accuracy of predicting protein local and global structural properties such as secondary structure and solvent accessible surface area has been stagnant for many years because of the challenge of accounting for non-local interactions between amino acid residues that are close in three-dimensional structural space but far from each other in their sequence positions. All existing machine-learning techniques relied on a sliding window of 10-20 amino acid residues to capture some 'short to intermediate' non-local interactions. Here, we employed Long Short-Term Memory (LSTM) Bidirectional Recurrent Neural Networks (BRNNs) which are capable of capturing long range interactions without using a window. We showed that the application of LSTM-BRNN to the prediction of protein structural properties makes the most significant improvement for residues with the most long-range contacts (|i-j| >19) over a previous window-based, deep-learning method SPIDER2. Capturing long-range interactions allows the accuracy of three-state secondary structure prediction to reach 84% and the correlation coefficient between predicted and actual solvent accessible surface areas to reach 0.80, plus a reduction of 5%, 10%, 5% and 10% in the mean absolute error for backbone ϕ , ψ , θ and τ angles, respectively, from SPIDER2. More significantly, 27% of 182724 40-residue models directly constructed from predicted C α atom-based θ and τ have similar structures to their corresponding native structures (6Å RMSD or less), which is 3% better than models built by ϕ and ψ angles. We expect the method to be useful for assisting protein structure and function prediction. The method is available as a SPIDER3 server and standalone package at http://sparks-lab.org . yaoqi.zhou@griffith.edu.au or yuedong.yang@griffith.edu.au. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email

  16. Fast dynamics perturbation analysis for prediction of protein functional sites

    Directory of Open Access Journals (Sweden)

    Cohn Judith D

    2008-01-01

    Full Text Available Abstract Background We present a fast version of the dynamics perturbation analysis (DPA algorithm to predict functional sites in protein structures. The original DPA algorithm finds regions in proteins where interactions cause a large change in the protein conformational distribution, as measured using the relative entropy Dx. Such regions are associated with functional sites. Results The Fast DPA algorithm, which accelerates DPA calculations, is motivated by an empirical observation that Dx in a normal-modes model is highly correlated with an entropic term that only depends on the eigenvalues of the normal modes. The eigenvalues are accurately estimated using first-order perturbation theory, resulting in a N-fold reduction in the overall computational requirements of the algorithm, where N is the number of residues in the protein. The performance of the original and Fast DPA algorithms was compared using protein structures from a standard small-molecule docking test set. For nominal implementations of each algorithm, top-ranked Fast DPA predictions overlapped the true binding site 94% of the time, compared to 87% of the time for original DPA. In addition, per-protein recall statistics (fraction of binding-site residues that are among predicted residues were slightly better for Fast DPA. On the other hand, per-protein precision statistics (fraction of predicted residues that are among binding-site residues were slightly better using original DPA. Overall, the performance of Fast DPA in predicting ligand-binding-site residues was comparable to that of the original DPA algorithm. Conclusion Compared to the original DPA algorithm, the decreased run time with comparable performance makes Fast DPA well-suited for implementation on a web server and for high-throughput analysis.

  17. Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction

    Directory of Open Access Journals (Sweden)

    Fofanov Viacheslav Y

    2010-05-01

    Full Text Available Abstract Background Structural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Comparative analysis of drug-receptor substructures across and within species has been used for lead evaluation. Substructure-level similarity between the binding sites of functionally similar proteins has also been used to identify instances of convergent evolution among proteins. In functionally homologous protein families, shared chemistry and geometry at catalytic sites provide a common, local point of comparison among proteins that may differ significantly at the sequence, fold, or domain topology levels. Results This paper describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST method uses all-against-all substructure comparison to determine Substructural Clusters (SCs. SCs characterize the binding site substructural variation within a protein family. In this paper we focus on examples of automatically determined SCs that can be linked to phylogenetic distance between family members, segregation by conformation, and organization by homology among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH framework constructs a representative motif for each protein cluster among the SCs determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing motifs. Conclusions FASST contributes a critical feedback and assessment step to existing binding site substructure identification methods and can be used for the thorough investigation of structure-function relationships. The application of MESH allows for an automated

  18. A hidden markov model derived structural alphabet for proteins.

    Science.gov (United States)

    Camproux, A C; Gautier, R; Tufféry, P

    2004-06-04

    Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.

  19. Prediction of protein hydration sites from sequence by modular neural networks

    DEFF Research Database (Denmark)

    Ehrlich, L.; Reczko, M.; Bohr, Henrik

    1998-01-01

    The hydration properties of a protein are important determinants of its structure and function. Here, modular neural networks are employed to predict ordered hydration sites using protein sequence information. First, secondary structure and solvent accessibility are predicted from sequence with two...... separate neural networks. These predictions are used as input together with protein sequences for networks predicting hydration of residues, backbone atoms and sidechains. These networks are teined with protein crystal structures. The prediction of hydration is improved by adding information on secondary...... structure and solvent accessibility and, using actual values of these properties, redidue hydration can be predicted to 77% accuracy with a Metthews coefficient of 0.43. However, predicted property data with an accuracy of 60-70% result in less than half the improvement in predictive performance observed...

  20. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  1. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng; Hu, ShanShan; Zhang, Jun; Gao, Xin; Li, Jinyan; Xia, Junfeng; Wang, Bing

    2015-01-01

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  2. SitesIdentify: a protein functional site prediction tool

    Directory of Open Access Journals (Sweden)

    Doig Andrew J

    2009-11-01

    Full Text Available Abstract Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify, based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/

  3. Sampling Realistic Protein Conformations Using Local Structural Bias

    DEFF Research Database (Denmark)

    Hamelryck, Thomas Wim; Kent, John T.; Krogh, A.

    2006-01-01

    The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which...... are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded...... in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications...

  4. A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

    Science.gov (United States)

    Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-10-01

    The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.

  5. Improved protein structure reconstruction using secondary structures, contacts at higher distance thresholds, and non-contacts.

    Science.gov (United States)

    Adhikari, Badri; Cheng, Jianlin

    2017-08-29

    Residue-residue contacts are key features for accurate de novo protein structure prediction. For the optimal utilization of these predicted contacts in folding proteins accurately, it is important to study the challenges of reconstructing protein structures using true contacts. Because contact-guided protein modeling approach is valuable for predicting the folds of proteins that do not have structural templates, it is necessary for reconstruction studies to focus on hard-to-predict protein structures. Using a data set consisting of 496 structural domains released in recent CASP experiments and a dataset of 150 representative protein structures, in this work, we discuss three techniques to improve the reconstruction accuracy using true contacts - adding secondary structures, increasing contact distance thresholds, and adding non-contacts. We find that reconstruction using secondary structures and contacts can deliver accuracy higher than using full contact maps. Similarly, we demonstrate that non-contacts can improve reconstruction accuracy not only when the used non-contacts are true but also when they are predicted. On the dataset consisting of 150 proteins, we find that by simply using low ranked predicted contacts as non-contacts and adding them as additional restraints, can increase the reconstruction accuracy by 5% when the reconstructed models are evaluated using TM-score. Our findings suggest that secondary structures are invaluable companions of contacts for accurate reconstruction. Confirming some earlier findings, we also find that larger distance thresholds are useful for folding many protein structures which cannot be folded using the standard definition of contacts. Our findings also suggest that for more accurate reconstruction using predicted contacts it is useful to predict contacts at higher distance thresholds (beyond 8 Å) and predict non-contacts.

  6. Prediction of localization and interactions of apoptotic proteins

    Directory of Open Access Journals (Sweden)

    Matula Pavel

    2009-07-01

    Full Text Available Abstract During apoptosis several mitochondrial proteins are released. Some of them participate in caspase-independent nuclear DNA degradation, especially apoptosis-inducing factor (AIF and endonuclease G (endoG. Another interesting protein, which was expected to act similarly as AIF due to the high sequence homology with AIF is AIF-homologous mitochondrion-associated inducer of death (AMID. We studied the structure, cellular localization, and interactions of several proteins in silico and also in cells using fluorescent microscopy. We found the AMID protein to be cytoplasmic, most probably incorporated into the cytoplasmic side of the lipid membranes. Bioinformatic predictions were conducted to analyze the interactions of the studied proteins with each other and with other possible partners. We conducted molecular modeling of proteins with unknown 3D structures. These models were then refined by MolProbity server and employed in molecular docking simulations of interactions. Our results show data acquired using a combination of modern in silico methods and image analysis to understand the localization, interactions and functions of proteins AMID, AIF, endonuclease G, and other apoptosis-related proteins.

  7. Exploration of the dynamic properties of protein complexes predicted from spatially constrained protein-protein interaction networks.

    Directory of Open Access Journals (Sweden)

    Eric A Yen

    2014-05-01

    Full Text Available Protein complexes are not static, but rather highly dynamic with subunits that undergo 1-dimensional diffusion with respect to each other. Interactions within protein complexes are modulated through regulatory inputs that alter interactions and introduce new components and deplete existing components through exchange. While it is clear that the structure and function of any given protein complex is coupled to its dynamical properties, it remains a challenge to predict the possible conformations that complexes can adopt. Protein-fragment Complementation Assays detect physical interactions between protein pairs constrained to ≤8 nm from each other in living cells. This method has been used to build networks composed of 1000s of pair-wise interactions. Significantly, these networks contain a wealth of dynamic information, as the assay is fully reversible and the proteins are expressed in their natural context. In this study, we describe a method that extracts this valuable information in the form of predicted conformations, allowing the user to explore the conformational landscape, to search for structures that correlate with an activity state, and estimate the abundance of conformations in the living cell. The generator is based on a Markov Chain Monte Carlo simulation that uses the interaction dataset as input and is constrained by the physical resolution of the assay. We applied this method to an 18-member protein complex composed of the seven core proteins of the budding yeast Arp2/3 complex and 11 associated regulators and effector proteins. We generated 20,480 output structures and identified conformational states using principle component analysis. We interrogated the conformation landscape and found evidence of symmetry breaking, a mixture of likely active and inactive conformational states and dynamic exchange of the core protein Arc15 between core and regulatory components. Our method provides a novel tool for prediction and

  8. DNA mimic proteins: functions, structures, and bioinformatic analysis.

    Science.gov (United States)

    Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J

    2014-05-13

    DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.

  9. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

    Directory of Open Access Journals (Sweden)

    Colin A Smith

    Full Text Available Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface, interactions between and within parts of the structure (e.g. domains can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

  10. SDSL-ESR-based protein structure characterization.

    Science.gov (United States)

    Strancar, Janez; Kavalenka, Aleh; Urbancic, Iztok; Ljubetic, Ajasja; Hemminga, Marcus A

    2010-03-01

    As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be seen in the number of protein structures published in the Protein Data Bank. This is especially the case for less-ordered, more hydrophobic and more flexible protein systems. The lack of efficient methods for structure determination calls for urgent development of a new class of biophysical techniques. This work attempts to address this problem with a novel combination of site-directed spin labelling electron spin resonance spectroscopy (SDSL-ESR) and protein structure modelling, which is coupled by restriction of the conformational spaces of the amino acid side chains. Comparison of the application to four different protein systems enables us to generalize the new method and to establish a general procedure for determination of protein structure.

  11. Mass Spectrometry Coupled Experiments and Protein Structure Modeling Methods

    Directory of Open Access Journals (Sweden)

    Lee Sael

    2013-10-01

    Full Text Available With the accumulation of next generation sequencing data, there is increasing interest in the study of intra-species difference in molecular biology, especially in relation to disease analysis. Furthermore, the dynamics of the protein is being identified as a critical factor in its function. Although accuracy of protein structure prediction methods is high, provided there are structural templates, most methods are still insensitive to amino-acid differences at critical points that may change the overall structure. Also, predicted structures are inherently static and do not provide information about structural change over time. It is challenging to address the sensitivity and the dynamics by computational structure predictions alone. However, with the fast development of diverse mass spectrometry coupled experiments, low-resolution but fast and sensitive structural information can be obtained. This information can then be integrated into the structure prediction process to further improve the sensitivity and address the dynamics of the protein structures. For this purpose, this article focuses on reviewing two aspects: the types of mass spectrometry coupled experiments and structural data that are obtainable through those experiments; and the structure prediction methods that can utilize these data as constraints. Also, short review of current efforts in integrating experimental data in the structural modeling is provided.

  12. Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.

    Science.gov (United States)

    Daberdaku, Sebastian; Ferrari, Carlo

    2018-02-06

    The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction

  13. Protein complex prediction based on k-connected subgraphs in protein interaction network

    OpenAIRE

    Habibi, Mahnaz; Eslahchi, Changiz; Wong, Limsoon

    2010-01-01

    Abstract Background Protein complexes play an important role in cellular mechanisms. Recently, several methods have been presented to predict protein complexes in a protein interaction network. In these methods, a protein complex is predicted as a dense subgraph of protein interactions. However, interactions data are incomplete and a protein complex does not have to be a complete or dense subgraph. Results We propose a more appropriate protein complex prediction method, CFA, that is based on ...

  14. Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System

    Directory of Open Access Journals (Sweden)

    Jinjian Jiang

    2017-07-01

    Full Text Available Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.

  15. Modularity in protein structures: study on all-alpha proteins.

    Science.gov (United States)

    Khan, Taushif; Ghosh, Indira

    2015-01-01

    Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.

  16. Protein enriched pasta: structure and digestibility of its protein network.

    Science.gov (United States)

    Laleg, Karima; Barron, Cécile; Santé-Lhoutellier, Véronique; Walrand, Stéphane; Micard, Valérie

    2016-02-01

    Wheat (W) pasta was enriched in 6% gluten (G), 35% faba (F) or 5% egg (E) to increase its protein content (13% to 17%). The impact of the enrichment on the multiscale structure of the pasta and on in vitro protein digestibility was studied. Increasing the protein content (W- vs. G-pasta) strengthened pasta structure at molecular and macroscopic scales but reduced its protein digestibility by 3% by forming a higher covalently linked protein network. Greater changes in the macroscopic and molecular structure of the pasta were obtained by varying the nature of protein used for enrichment. Proteins in G- and E-pasta were highly covalently linked (28-32%) resulting in a strong pasta structure. Conversely, F-protein (98% SDS-soluble) altered the pasta structure by diluting gluten and formed a weak protein network (18% covalent link). As a result, protein digestibility in F-pasta was significantly higher (46%) than in E- (44%) and G-pasta (39%). The effect of low (55 °C, LT) vs. very high temperature (90 °C, VHT) drying on the protein network structure and digestibility was shown to cause greater molecular changes than pasta formulation. Whatever the pasta, a general strengthening of its structure, a 33% to 47% increase in covalently linked proteins and a higher β-sheet structure were observed. However, these structural differences were evened out after the pasta was cooked, resulting in identical protein digestibility in LT and VHT pasta. Even after VHT drying, F-pasta had the best amino acid profile with the highest protein digestibility, proof of its nutritional interest.

  17. Structure-based barcoding of proteins.

    Science.gov (United States)

    Metri, Rahul; Jerath, Gaurav; Kailas, Govind; Gacche, Nitin; Pal, Adityabarna; Ramakrishnan, Vibin

    2014-01-01

    A reduced representation in the format of a barcode has been developed to provide an overview of the topological nature of a given protein structure from 3D coordinate file. The molecular structure of a protein coordinate file from Protein Data Bank is first expressed in terms of an alpha-numero code and further converted to a barcode image. The barcode representation can be used to compare and contrast different proteins based on their structure. The utility of this method has been exemplified by comparing structural barcodes of proteins that belong to same fold family, and across different folds. In addition to this, we have attempted to provide an illustration to (i) the structural changes often seen in a given protein molecule upon interaction with ligands and (ii) Modifications in overall topology of a given protein during evolution. The program is fully downloadable from the website http://www.iitg.ac.in/probar/. © 2013 The Protein Society.

  18. De novo protein structure determination using sparse NMR data

    International Nuclear Information System (INIS)

    Bowers, Peter M.; Strauss, Charlie E.M.; Baker, David

    2000-01-01

    We describe a method for generating moderate to high-resolution protein structures using limited NMR data combined with the ab initio protein structure prediction method Rosetta. Peptide fragments are selected from proteins of known structure based on sequence similarity and consistency with chemical shift and NOE data. Models are built from these fragments by minimizing an energy function that favors hydrophobic burial, strand pairing, and satisfaction of NOE constraints. Models generated using this procedure with ∼1 NOE constraint per residue are in some cases closer to the corresponding X-ray structures than the published NMR solution structures. The method requires only the sparse constraints available during initial stages of NMR structure determination, and thus holds promise for increasing the speed with which protein solution structures can be determined

  19. Functional classification of protein structures by local structure matching in graph representation.

    Science.gov (United States)

    Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo

    2018-03-31

    As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  20. Protein function prediction using neighbor relativity in protein-protein interaction network.

    Science.gov (United States)

    Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

    2013-04-01

    There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Deep recurrent conditional random field network for protein secondary prediction

    DEFF Research Database (Denmark)

    Johansen, Alexander Rosenberg; Sønderby, Søren Kaae; Sønderby, Casper Kaae

    2017-01-01

    Deep learning has become the state-of-the-art method for predicting protein secondary structure from only its amino acid residues and sequence profile. Building upon these results, we propose to combine a bi-directional recurrent neural network (biRNN) with a conditional random field (CRF), which...... of the labels for all time-steps. We condition the CRF on the output of biRNN, which learns a distributed representation based on the entire sequence. The biRNN-CRF is therefore close to ideally suited for the secondary structure task because a high degree of cross-talk between neighboring elements can...

  2. Prediction of methyl-side Chain Dynamics in Proteins

    International Nuclear Information System (INIS)

    Ming Dengming; Brueschweiler, Rafael

    2004-01-01

    A simple analytical model is presented for the prediction of methyl-side chain dynamics in comparison with S 2 order parameters obtained by NMR relaxation spectroscopy. The model, which is an extension of the local contact model for backbone order parameter prediction, uses a static 3D protein structure as input. It expresses the methyl-group S 2 order parameters as a function of local contacts of the methyl carbon with respect to the neighboring atoms in combination with the number of consecutive mobile dihedral angles between the methyl group and the protein backbone. For six out of seven proteins the prediction results are good when compared with experimentally determined methyl-group S 2 values with an average correlation coefficient r-bar=0.65±0.14. For the unusually rigid cytochrome c 2 no significant correlation between prediction and experiment is found. The presented model provides independent support for the reliability of current side-chain relaxation methods along with their interpretation by the model-free formalism

  3. Prediction of RNA-Binding Proteins by Voting Systems

    Directory of Open Access Journals (Sweden)

    C. R. Peng

    2011-01-01

    Full Text Available It is important to identify which proteins can interact with RNA for the purpose of protein annotation, since interactions between RNA and proteins influence the structure of the ribosome and play important roles in gene expression. This paper tries to identify proteins that can interact with RNA using voting systems. Firstly through Weka, 34 learning algorithms are chosen for investigation. Then simple majority voting system (SMVS is used for the prediction of RNA-binding proteins, achieving average ACC (overall prediction accuracy value of 79.72% and MCC (Matthew’s correlation coefficient value of 59.77% for the independent testing dataset. Then mRMR (minimum redundancy maximum relevance strategy is used, which is transferred into algorithm selection. In addition, the MCC value of each classifier is assigned to be the weight of the classifier’s vote. As a result, best average MCC values are attained when 22 algorithms are selected and integrated through weighted votes, which are 64.70% for the independent testing dataset, and ACC value is 82.04% at this moment.

  4. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    Science.gov (United States)

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  5. InterMap3D: predicting and visualizing co-evolving protein residues

    DEFF Research Database (Denmark)

    Oliveira, Rodrigo Gouveia; Roque, francisco jose sousa simôes almeida; Wernersson, Rasmus

    2009-01-01

    InterMap3D predicts co-evolving protein residues and plots them on the 3D protein structure. Starting with a single protein sequence, InterMap3D automatically finds a set of homologous sequences, generates an alignment and fetches the most similar 3D structure from the Protein Data Bank (PDB......). It can also accept a user-generated alignment. Based on the alignment, co-evolving residues are then predicted using three different methods: Row and Column Weighing of Mutual Information, Mutual Information/Entropy and Dependency. Finally, InterMap3D generates high-quality images of the protein...

  6. SDSL-ESR-based protein structure characterization

    NARCIS (Netherlands)

    Strancar, J.; Kavalenka, A.A.; Urbancic, I.; Ljubetic, A.; Hemminga, M.A.

    2010-01-01

    As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be

  7. Overcoming barriers to membrane protein structure determination.

    Science.gov (United States)

    Bill, Roslyn M; Henderson, Peter J F; Iwata, So; Kunji, Edmund R S; Michel, Hartmut; Neutze, Richard; Newstead, Simon; Poolman, Bert; Tate, Christopher G; Vogel, Horst

    2011-04-01

    After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new approaches to solving membrane protein structures based on recent technological advances. Rational approaches to overcoming the bottlenecks in the field are urgently required as membrane proteins, which typically comprise ~30% of the proteomes of organisms, are dramatically under-represented in the structural database of the Protein Data Bank.

  8. New tips for structure prediction by comparative modeling

    Science.gov (United States)

    Rayan, Anwar

    2009-01-01

    Comparative modelling is utilized to predict the 3-dimensional conformation of a given protein (target) based on its sequence alignment to experimentally determined protein structure (template). The use of such technique is already rewarding and increasingly widespread in biological research and drug development. The accuracy of the predictions as commonly accepted depends on the score of sequence identity of the target protein to the template. To assess the relationship between sequence identity and model quality, we carried out an analysis of a set of 4753 sequence and structure alignments. Throughout this research, the model accuracy was measured by root mean square deviations of Cα atoms of the target-template structures. Surprisingly, the results show that sequence identity of the target protein to the template is not a good descriptor to predict the accuracy of the 3-D structure model. However, in a large number of cases, comparative modelling with lower sequence identity of target to template proteins led to more accurate 3-D structure model. As a consequence of this study, we suggest new tips for improving the quality of omparative models, particularly for models whose target-template sequence identity is below 50%. PMID:19255646

  9. HKC: An Algorithm to Predict Protein Complexes in Protein-Protein Interaction Networks

    Directory of Open Access Journals (Sweden)

    Xiaomin Wang

    2011-01-01

    Full Text Available With the availability of more and more genome-scale protein-protein interaction (PPI networks, research interests gradually shift to Systematic Analysis on these large data sets. A key topic is to predict protein complexes in PPI networks by identifying clusters that are densely connected within themselves but sparsely connected with the rest of the network. In this paper, we present a new topology-based algorithm, HKC, to detect protein complexes in genome-scale PPI networks. HKC mainly uses the concepts of highest k-core and cohesion to predict protein complexes by identifying overlapping clusters. The experiments on two data sets and two benchmarks show that our algorithm has relatively high F-measure and exhibits better performance compared with some other methods.

  10. Nucleic acid secondary structure prediction and display.

    OpenAIRE

    Stüber, K

    1986-01-01

    A set of programs has been developed for the prediction and display of nucleic acid secondary structures. Information from experimental data can be used to restrict or enforce secondary structural elements. The predictions can be displayed either on normal line printers or on graphic devices like plotters or graphic terminals.

  11. A computer graphics program system for protein structure representation.

    Science.gov (United States)

    Ross, A M; Golub, E E

    1988-01-01

    We have developed a computer graphics program system for the schematic representation of several protein secondary structure analysis algorithms. The programs calculate the probability of occurrence of alpha-helix, beta-sheet and beta-turns by the method of Chou and Fasman and assign unique predicted structure to each residue using a novel conflict resolution algorithm based on maximum likelihood. A detailed structure map containing secondary structure, hydrophobicity, sequence identity, sequence numbering and the location of putative N-linked glycosylation sites is then produced. In addition, helical wheel diagrams and hydrophobic moment calculations can be performed to further analyze the properties of selected regions of the sequence. As they require only structure specification as input, the graphics programs can easily be adapted for use with other secondary structure prediction schemes. The use of these programs to analyze protein structure-function relationships is described and evaluated. PMID:2832829

  12. Rapid and reliable protein structure determination via chemical shift threading.

    Science.gov (United States)

    Hafsa, Noor E; Berjanskii, Mark V; Arndt, David; Wishart, David S

    2018-01-01

    Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .

  13. NAPS: Network Analysis of Protein Structures

    Science.gov (United States)

    Chakrabarty, Broto; Parekh, Nita

    2016-01-01

    Traditionally, protein structures have been analysed by the secondary structure architecture and fold arrangement. An alternative approach that has shown promise is modelling proteins as a network of non-covalent interactions between amino acid residues. The network representation of proteins provide a systems approach to topological analysis of complex three-dimensional structures irrespective of secondary structure and fold type and provide insights into structure-function relationship. We have developed a web server for network based analysis of protein structures, NAPS, that facilitates quantitative and qualitative (visual) analysis of residue–residue interactions in: single chains, protein complex, modelled protein structures and trajectories (e.g. from molecular dynamics simulations). The user can specify atom type for network construction, distance range (in Å) and minimal amino acid separation along the sequence. NAPS provides users selection of node(s) and its neighbourhood based on centrality measures, physicochemical properties of amino acids or cluster of well-connected residues (k-cliques) for further analysis. Visual analysis of interacting domains and protein chains, and shortest path lengths between pair of residues are additional features that aid in functional analysis. NAPS support various analyses and visualization views for identifying functional residues, provide insight into mechanisms of protein folding, domain-domain and protein–protein interactions for understanding communication within and between proteins. URL:http://bioinf.iiit.ac.in/NAPS/. PMID:27151201

  14. Can Morphing Methods Predict Intermediate Structures?

    Science.gov (United States)

    Weiss, Dahlia R.; Levitt, Michael

    2009-01-01

    Movement is crucial to the biological function of many proteins, yet crystallographic structures of proteins can give us only a static snapshot. The protein dynamics that are important to biological function often happen on a timescale that is unattainable through detailed simulation methods such as molecular dynamics as they often involve crossing high-energy barriers. To address this coarse-grained motion, several methods have been implemented as web servers in which a set of coordinates is usually linearly interpolated from an initial crystallographic structure to a final crystallographic structure. We present a new morphing method that does not extrapolate linearly and can therefore go around high-energy barriers and which can produce different trajectories between the same two starting points. In this work, we evaluate our method and other established coarse-grained methods according to an objective measure: how close a coarse-grained dynamics method comes to a crystallographically determined intermediate structure when calculating a trajectory between the initial and final crystal protein structure. We test this with a set of five proteins with at least three crystallographically determined on-pathway high-resolution intermediate structures from the Protein Data Bank. For simple hinging motions involving a small conformational change, segmentation of the protein into two rigid sections outperforms other more computationally involved methods. However, large-scale conformational change is best addressed using a nonlinear approach and we suggest that there is merit in further developing such methods. PMID:18996395

  15. A generative, probabilistic model of local protein structure

    DEFF Research Database (Denmark)

    Boomsma, Wouter; Mardia, Kanti V.; Taylor, Charles C.

    2008-01-01

    Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative...... conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state...

  16. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence.

    Science.gov (United States)

    Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin

    2011-08-21

    Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. Solution NMR structure determination of proteins revisited

    International Nuclear Information System (INIS)

    Billeter, Martin; Wagner, Gerhard; Wuethrich, Kurt

    2008-01-01

    This 'Perspective' bears on the present state of protein structure determination by NMR in solution. The focus is on a comparison of the infrastructure available for NMR structure determination when compared to protein crystal structure determination by X-ray diffraction. The main conclusion emerges that the unique potential of NMR to generate high resolution data also on dynamics, interactions and conformational equilibria has contributed to a lack of standard procedures for structure determination which would be readily amenable to improved efficiency by automation. To spark renewed discussion on the topic of NMR structure determination of proteins, procedural steps with high potential for improvement are identified

  18. Extracting knowledge from protein structure geometry

    DEFF Research Database (Denmark)

    Røgen, Peter; Koehl, Patrice

    2013-01-01

    potential from geometric knowledge extracted from native and misfolded conformers of protein structures. This new potential, Metric Protein Potential (MPP), has two main features that are key to its success. Firstly, it is composite in that it includes local and nonlocal geometric information on proteins...

  19. Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

    Directory of Open Access Journals (Sweden)

    Lee Sael

    2010-12-01

    Full Text Available Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.

  20. Binding ligand prediction for proteins using partial matching of local surface patches.

    Science.gov (United States)

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.

  1. On the analysis of protein-protein interactions via knowledge-based potentials for the prediction of protein-protein docking

    DEFF Research Database (Denmark)

    Feliu, Elisenda; Aloy, Patrick; Oliva, Baldo

    2011-01-01

    Development of effective methods to screen binary interactions obtained by rigid-body protein-protein docking is key for structure prediction of complexes and for elucidating physicochemical principles of protein-protein binding. We have derived empirical knowledge-based potential functions for s...... and with independence of the partner. This information is encoded at the residue level and could be easily incorporated in the initial grid scoring for Fast Fourier Transform rigid-body docking methods.......Development of effective methods to screen binary interactions obtained by rigid-body protein-protein docking is key for structure prediction of complexes and for elucidating physicochemical principles of protein-protein binding. We have derived empirical knowledge-based potential functions...... for selecting rigid-body docking poses. These potentials include the energetic component that provides the residues with a particular secondary structure and surface accessibility. These scoring functions have been tested on a state-of-art benchmark dataset and on a decoy dataset of permanent interactions. Our...

  2. Protein-Based Urine Test Predicts Kidney Transplant Outcomes

    Science.gov (United States)

    ... News Releases News Release Thursday, August 22, 2013 Protein-based urine test predicts kidney transplant outcomes NIH- ... supporting development of noninvasive tests. Levels of a protein in the urine of kidney transplant recipients can ...

  3. Feature-Based and String-Based Models for Predicting RNA-Protein Interaction

    Directory of Open Access Journals (Sweden)

    Donald Adjeroh

    2018-03-01

    Full Text Available In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI. In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences, and structure information (protein and RNA secondary structures. This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods.

  4. Validation-driven protein-structure improvement

    NARCIS (Netherlands)

    Touw, W.G.

    2016-01-01

    High-quality protein structure models are essential for many Life Science applications, such as protein engineering, molecular dynamics, drug design, and homology modelling. The WHAT_CHECK model validation project and the PDB_REDO model optimisation project have shown that many structure models in

  5. Tertiary alphabet for the observable protein structural universe.

    Science.gov (United States)

    Mackenzie, Craig O; Zhou, Jianfu; Grigoryan, Gevorg

    2016-11-22

    Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.

  6. Predicting RNA Structure Using Mutual Information

    DEFF Research Database (Denmark)

    Freyhult, E.; Moulton, V.; Gardner, P. P.

    2005-01-01

    , to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. Results: We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall...... package. Conclusion: MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam. Availability: MIfold is freely available from http......Background: With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure...

  7. An update of the DEF database of protein fold class predictions

    DEFF Research Database (Denmark)

    Reczko, Martin; Karras, Dimitris; Bohr, Henrik

    1997-01-01

    An update is given on the Database of Expected Fold classes (DEF) that contains a collection of fold-class predictions made from protein sequences and a mail server that provides new predictions for new sequences. To any given sequence one of 49 fold-classes is chosen to classify the structure re...... related to the sequence with high accuracy. The updated predictions system is developed using data from the new version of the 3D-ALI database of aligned protein structures and thus is giving more reliable and more detailed predictions than the previous DEF system.......An update is given on the Database of Expected Fold classes (DEF) that contains a collection of fold-class predictions made from protein sequences and a mail server that provides new predictions for new sequences. To any given sequence one of 49 fold-classes is chosen to classify the structure...

  8. RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method

    KAUST Repository

    Ganesan, Pugalenthi; Kandaswamy, Krishna Kumar Umar; Chou -, Kuochen; Vivekanandan, Saravanan; Kolatkar, Prasanna R.

    2012-01-01

    Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. - See more at: http://www.eurekaselect.com/89216/article#sthash.pwVGFUjq.dpuf

  9. Prediction and characterization of protein-protein interaction networks in swine

    Directory of Open Access Journals (Sweden)

    Wang Fen

    2012-01-01

    Full Text Available Abstract Background Studying the large-scale protein-protein interaction (PPI network is important in understanding biological processes. The current research presents the first PPI map of swine, which aims to give new insights into understanding their biological processes. Results We used three methods, Interolog-based prediction of porcine PPI network, domain-motif interactions from structural topology-based prediction of porcine PPI network and motif-motif interactions from structural topology-based prediction of porcine PPI network, to predict porcine protein interactions among 25,767 porcine proteins. We predicted 20,213, 331,484, and 218,705 porcine PPIs respectively, merged the three results into 567,441 PPIs, constructed four PPI networks, and analyzed the topological properties of the porcine PPI networks. Our predictions were validated with Pfam domain annotations and GO annotations. Averages of 70, 10,495, and 863 interactions were related to the Pfam domain-interacting pairs in iPfam database. For comparison, randomized networks were generated, and averages of only 4.24, 66.79, and 44.26 interactions were associated with Pfam domain-interacting pairs in iPfam database. In GO annotations, we found 52.68%, 75.54%, 27.20% of the predicted PPIs sharing GO terms respectively. However, the number of PPI pairs sharing GO terms in the 10,000 randomized networks reached 52.68%, 75.54%, 27.20% is 0. Finally, we determined the accuracy and precision of the methods. The methods yielded accuracies of 0.92, 0.53, and 0.50 at precisions of about 0.93, 0.74, and 0.75, respectively. Conclusion The results reveal that the predicted PPI networks are considerably reliable. The present research is an important pioneering work on protein function research. The porcine PPI data set, the confidence score of each interaction and a list of related data are available at (http://pppid.biositemap.com/.

  10. Computational predictions of zinc oxide hollow structures

    Science.gov (United States)

    Tuoc, Vu Ngoc; Huan, Tran Doan; Thao, Nguyen Thi

    2018-03-01

    Nanoporous materials are emerging as potential candidates for a wide range of technological applications in environment, electronic, and optoelectronics, to name just a few. Within this active research area, experimental works are predominant while theoretical/computational prediction and study of these materials face some intrinsic challenges, one of them is how to predict porous structures. We propose a computationally and technically feasible approach for predicting zinc oxide structures with hollows at the nano scale. The designed zinc oxide hollow structures are studied with computations using the density functional tight binding and conventional density functional theory methods, revealing a variety of promising mechanical and electronic properties, which can potentially find future realistic applications.

  11. Compare local pocket and global protein structure models by small structure patterns

    KAUST Repository

    Cui, Xuefeng

    2015-09-09

    Researchers proposed several criteria to assess the quality of predicted protein structures because it is one of the essential tasks in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competitions. Popular criteria include root mean squared deviation (RMSD), MaxSub score, TM-score, GDT-TS and GDT-HA scores. All these criteria require calculation of rigid transformations to superimpose the the predicted protein structure to the native protein structure. Yet, how to obtain the rigid transformations is unknown or with high time complexity, and, hence, heuristic algorithms were proposed. In this work, we carefully design various small structure patterns, including the ones specifically tuned for local pockets. Such structure patterns are biologically meaningful, and address the issue of relying on a sufficient number of backbone residue fragments for existing methods. We sample the rigid transformations from these small structure patterns; and the optimal superpositions yield by these small structures are refined and reported. As a result, among 11; 669 pairs of predicted and native local protein pocket models from the CASP10 dataset, the GDT-TS scores calculated by our method are significantly higher than those calculated by LGA. Moreover, our program is computationally much more efficient. Source codes and executables are publicly available at http://www.cbrc.kaust.edu.sa/prosta/

  12. Heterochiral Knottin Protein: Folding and Solution Structure.

    Science.gov (United States)

    Mong, Surin K; Cochran, Frank V; Yu, Hongtao; Graziano, Zachary; Lin, Yu-Shan; Cochran, Jennifer R; Pentelute, Bradley L

    2017-10-31

    Homochirality is a general feature of biological macromolecules, and Nature includes few examples of heterochiral proteins. Herein, we report on the design, chemical synthesis, and structural characterization of heterochiral proteins possessing loops of amino acids of chirality opposite to that of the rest of a protein scaffold. Using the protein Ecballium elaterium trypsin inhibitor II, we discover that selective β-alanine substitution favors the efficient folding of our heterochiral constructs. Solution nuclear magnetic resonance spectroscopy of one such heterochiral protein reveals a homogeneous global fold. Additionally, steered molecular dynamics simulation indicate β-alanine reduces the free energy required to fold the protein. We also find these heterochiral proteins to be more resistant to proteolysis than homochiral l-proteins. This work informs the design of heterochiral protein architectures containing stretches of both d- and l-amino acids.

  13. Amino acid code of protein secondary structure.

    Science.gov (United States)

    Shestopalov, B V

    2003-01-01

    The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three

  14. Prediction of thermodynamic instabilities of protein solutions from simple protein–protein interactions

    International Nuclear Information System (INIS)

    D’Agostino, Tommaso; Solana, José Ramón; Emanuele, Antonio

    2013-01-01

    Highlights: ► We propose a model of effective protein–protein interaction embedding solvent effects. ► A previous square-well model is enhanced by giving to the interaction a free energy character. ► The temperature dependence of the interaction is due to entropic effects of the solvent. ► The validity of the original SW model is extended to entropy driven phase transitions. ► We get good fits for lysozyme and haemoglobin spinodal data taken from literature. - Abstract: Statistical thermodynamics of protein solutions is often studied in terms of simple, microscopic models of particles interacting via pairwise potentials. Such modelling can reproduce the short range structure of protein solutions at equilibrium and predict thermodynamics instabilities of these systems. We introduce a square well model of effective protein–protein interaction that embeds the solvent’s action. We modify an existing model [45] by considering a well depth having an explicit dependence on temperature, i.e. an explicit free energy character, thus encompassing the statistically relevant configurations of solvent molecules around proteins. We choose protein solutions exhibiting demixing upon temperature decrease (lysozyme, enthalpy driven) and upon temperature increase (haemoglobin, entropy driven). We obtain satisfactory fits of spinodal curves for both the two proteins without adding any mean field term, thus extending the validity of the original model. Our results underline the solvent role in modulating or stretching the interaction potential

  15. K-nearest uphill clustering in the protein structure space

    KAUST Repository

    Cui, Xuefeng; Gao, Xin

    2016-01-01

    The protein structure classification problem, which is to assign a protein structure to a cluster of similar proteins, is one of the most fundamental problems in the construction and application of the protein structure space. Early manually curated

  16. Automated protein structure calculation from NMR data

    International Nuclear Information System (INIS)

    Williamson, Mike P.; Craven, C. Jeremy

    2009-01-01

    Current software is almost at the stage to permit completely automatic structure determination of small proteins of <15 kDa, from NMR spectra to structure validation with minimal user interaction. This goal is welcome, as it makes structure calculation more objective and therefore more easily validated, without any loss in the quality of the structures generated. Moreover, it releases expert spectroscopists to carry out research that cannot be automated. It should not take much further effort to extend automation to ca 20 kDa. However, there are technological barriers to further automation, of which the biggest are identified as: routines for peak picking; adoption and sharing of a common framework for structure calculation, including the assembly of an automated and trusted package for structure validation; and sample preparation, particularly for larger proteins. These barriers should be the main target for development of methodology for protein structure determination, particularly by structural genomics consortia

  17. Structural anatomy of telomere OB proteins.

    Science.gov (United States)

    Horvath, Martin P

    2011-10-01

    Telomere DNA-binding proteins protect the ends of chromosomes in eukaryotes. A subset of these proteins are constructed with one or more OB folds and bind with G+T-rich single-stranded DNA found at the extreme termini. The resulting DNA-OB protein complex interacts with other telomere components to coordinate critical telomere functions of DNA protection and DNA synthesis. While the first crystal and NMR structures readily explained protection of telomere ends, the picture of how single-stranded DNA becomes available to serve as primer and template for synthesis of new telomere DNA is only recently coming into focus. New structures of telomere OB fold proteins alongside insights from genetic and biochemical experiments have made significant contributions towards understanding how protein-binding OB proteins collaborate with DNA-binding OB proteins to recruit telomerase and DNA polymerase for telomere homeostasis. This review surveys telomere OB protein structures alongside highly comparable structures derived from replication protein A (RPA) components, with the goal of providing a molecular context for understanding telomere OB protein evolution and mechanism of action in protection and synthesis of telomere DNA.

  18. Stochastic Extreme Load Predictions for Marine Structures

    DEFF Research Database (Denmark)

    Jensen, Jørgen Juncher

    1999-01-01

    Development of rational design criteria for marine structures requires reliable estimates for the maximum wave-induced loads the structure may encounter during its operational lifetime. The paper discusses various methods for extreme value predictions taking into account the non-linearity of the ......Development of rational design criteria for marine structures requires reliable estimates for the maximum wave-induced loads the structure may encounter during its operational lifetime. The paper discusses various methods for extreme value predictions taking into account the non......-linearity of the waves and the response. As example the wave-induced bending moment in the ship hull girder is considered....

  19. Structural and Functional Annotation of Hypothetical Proteins of O139

    Directory of Open Access Journals (Sweden)

    Md. Saiful Islam

    2015-06-01

    Full Text Available In developing countries threat of cholera is a significant health concern whenever water purification and sewage disposal systems are inadequate. Vibrio cholerae is one of the responsible bacteria involved in cholera disease. The complete genome sequence of V. cholerae deciphers the presence of various genes and hypothetical proteins whose function are not yet understood. Hence analyzing and annotating the structure and function of hypothetical proteins is important for understanding the V. cholerae. V. cholerae O139 is the most common and pathogenic bacterial strain among various V. cholerae strains. In this study sequence of six hypothetical proteins of V. cholerae O139 has been annotated from NCBI. Various computational tools and databases have been used to determine domain family, protein-protein interaction, solubility of protein, ligand binding sites etc. The three dimensional structure of two proteins were modeled and their ligand binding sites were identified. We have found domains and families of only one protein. The analysis revealed that these proteins might have antibiotic resistance activity, DNA breaking-rejoining activity, integrase enzyme activity, restriction endonuclease, etc. Structural prediction of these proteins and detection of binding sites from this study would indicate a potential target aiding docking studies for therapeutic designing against cholera.

  20. Relationship between Molecular Structure Characteristics of Feed Proteins and Protein In vitro Digestibility and Solubility.

    Science.gov (United States)

    Bai, Mingmei; Qin, Guixin; Sun, Zewei; Long, Guohui

    2016-08-01

    The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller's dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR) spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area) of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003); moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004). On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (pdigestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins.

  1. Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs.

    Science.gov (United States)

    Várnai, Csilla; Burkoff, Nikolas S; Wild, David L

    2017-01-01

    Evolutionary information stored in multiple sequence alignments (MSAs) has been used to identify the interaction interface of protein complexes, by measuring either co-conservation or co-mutation of amino acid residues across the interface. Recently, maximum entropy related correlated mutation measures (CMMs) such as direct information, decoupling direct from indirect interactions, have been developed to identify residue pairs interacting across the protein complex interface. These studies have focussed on carefully selected protein complexes with large, good-quality MSAs. In this work, we study protein complexes with a more typical MSA consisting of fewer than 400 sequences, using a set of 79 intramolecular protein complexes. Using a maximum entropy based CMM at the residue level, we develop an interface level CMM score to be used in re-ranking docking decoys. We demonstrate that our interface level CMM score compares favourably to the complementarity trace score, an evolutionary information-based score measuring co-conservation, when combined with the number of interface residues, a knowledge-based potential and the variability score of individual amino acid sites. We also demonstrate, that, since co-mutation and co-complementarity in the MSA contain orthogonal information, the best prediction performance using evolutionary information can be achieved by combining the co-mutation information of the CMM with co-conservation information of a complementarity trace score, predicting a near-native structure as the top prediction for 41% of the dataset. The method presented is not restricted to small MSAs, and will likely improve interface prediction also for complexes with large and good-quality MSAs.

  2. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field.

    Science.gov (United States)

    Xu, Dong; Zhang, Yang

    2012-07-01

    Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.

  3. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

    KAUST Repository

    Najibi, Seyed Morteza

    2017-02-08

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  4. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

    KAUST Repository

    Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z.; Gao, Xin

    2017-01-01

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  5. C-reactive protein, fibrinogen, and cardiovascular disease prediction

    DEFF Research Database (Denmark)

    Kaptoge, Stephen; Di Angelantonio, Emanuele; Pennells, Lisa

    2012-01-01

    There is debate about the value of assessing levels of C-reactive protein (CRP) and other biomarkers of inflammation for the prediction of first cardiovascular events.......There is debate about the value of assessing levels of C-reactive protein (CRP) and other biomarkers of inflammation for the prediction of first cardiovascular events....

  6. Integrating protein structures and precomputed genealogies in the Magnum database: Examples with cellular retinoid binding proteins

    Directory of Open Access Journals (Sweden)

    Bradley Michael E

    2006-02-01

    Full Text Available Abstract Background When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use. Results The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1 multiple sequence alignments, 2 mapping of alignment sites to crystal structure sites, 3 phylogenetic trees, 4 inferred ancestral sequences at internal tree nodes, and 5 amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures. Conclusion We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural

  7. Structural symmetry and protein function.

    Science.gov (United States)

    Goodsell, D S; Olson, A J

    2000-01-01

    The majority of soluble and membrane-bound proteins in modern cells are symmetrical oligomeric complexes with two or more subunits. The evolutionary selection of symmetrical oligomeric complexes is driven by functional, genetic, and physicochemical needs. Large proteins are selected for specific morphological functions, such as formation of rings, containers, and filaments, and for cooperative functions, such as allosteric regulation and multivalent binding. Large proteins are also more stable against denaturation and have a reduced surface area exposed to solvent when compared with many individual, smaller proteins. Large proteins are constructed as oligomers for reasons of error control in synthesis, coding efficiency, and regulation of assembly. Symmetrical oligomers are favored because of stability and finite control of assembly. Several functions limit symmetry, such as interaction with DNA or membranes, and directional motion. Symmetry is broken or modified in many forms: quasisymmetry, in which identical subunits adopt similar but different conformations; pleomorphism, in which identical subunits form different complexes; pseudosymmetry, in which different molecules form approximately symmetrical complexes; and symmetry mismatch, in which oligomers of different symmetries interact along their respective symmetry axes. Asymmetry is also observed at several levels. Nearly all complexes show local asymmetry at the level of side chain conformation. Several complexes have reciprocating mechanisms in which the complex is asymmetric, but, over time, all subunits cycle through the same set of conformations. Global asymmetry is only rarely observed. Evolution of oligomeric complexes may favor the formation of dimers over complexes with higher cyclic symmetry, through a mechanism of prepositioned pairs of interacting residues. However, examples have been found for all of the crystallographic point groups, demonstrating that functional need can drive the evolution of

  8. Protein structure: geometry, topology and classification

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, William R.; May, Alex C.W.; Brown, Nigel P.; Aszodi, Andras [Division of Mathematical Biology, National Institute for Medical Research, London (United Kingdom)

    2001-04-01

    The structural principals of proteins are reviewed and analysed from a geometric perspective with a view to revealing the underlying regularities in their construction. Computer methods for the automatic comparison and classification of these structures are then reviewed with an analysis of the statistical significance of comparing different shapes. Following an analysis of the current state of the classification of proteins, more abstract geometric and topological representations are explored, including the occurrence of knotted topologies. The review concludes with a consideration of the origin of higher-level symmetries in protein structure. (author)

  9. Taking advantage of local structure descriptors to analyze interresidue contacts in protein structures and protein complexes.

    Science.gov (United States)

    Martin, Juliette; Regad, Leslie; Etchebest, Catherine; Camproux, Anne-Claude

    2008-11-15

    Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Voronoï tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions.

  10. Prediction of the location and type of beta-turns in proteins using neural networks.

    OpenAIRE

    Shepherd, A. J.; Gorse, D.; Thornton, J. M.

    1999-01-01

    A neural network has been used to predict both the location and the type of beta-turns in a set of 300 nonhomologous protein domains. A substantial improvement in prediction accuracy compared with previous methods has been achieved by incorporating secondary structure information in the input data. The total percentage of residues correctly classified as beta-turn or not-beta-turn is around 75% with predicted secondary structure information. More significantly, the method gives a Matthews cor...

  11. Binding free energy analysis of protein-protein docking model structures by evERdock.

    Science.gov (United States)

    Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio

    2018-03-14

    To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.

  12. A domain-based approach to predict protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Resat Haluk

    2007-06-01

    Full Text Available Abstract Background Knowing which proteins exist in a certain organism or cell type and how these proteins interact with each other are necessary for the understanding of biological processes at the whole cell level. The determination of the protein-protein interaction (PPI networks has been the subject of extensive research. Despite the development of reasonably successful methods, serious technical difficulties still exist. In this paper we present DomainGA, a quantitative computational approach that uses the information about the domain-domain interactions to predict the interactions between proteins. Results DomainGA is a multi-parameter optimization method in which the available PPI information is used to derive a quantitative scoring scheme for the domain-domain pairs. Obtained domain interaction scores are then used to predict whether a pair of proteins interacts. Using the yeast PPI data and a series of tests, we show the robustness and insensitivity of the DomainGA method to the selection of the parameter sets, score ranges, and detection rules. Our DomainGA method achieves very high explanation ratios for the positive and negative PPIs in yeast. Based on our cross-verification tests on human PPIs, comparison of the optimized scores with the structurally observed domain interactions obtained from the iPFAM database, and sensitivity and specificity analysis; we conclude that our DomainGA method shows great promise to be applicable across multiple organisms. Conclusion We envision the DomainGA as a first step of a multiple tier approach to constructing organism specific PPIs. As it is based on fundamental structural information, the DomainGA approach can be used to create potential PPIs and the accuracy of the constructed interaction template can be further improved using complementary methods. Explanation ratios obtained in the reported test case studies clearly show that the false prediction rates of the template networks constructed

  13. Identification of similar regions of protein structures using integrated sequence and structure analysis tools

    Directory of Open Access Journals (Sweden)

    Heiland Randy

    2006-03-01

    Full Text Available Abstract Background Understanding protein function from its structure is a challenging problem. Sequence based approaches for finding homology have broad use for annotation of both structure and function. 3D structural information of protein domains and their interactions provide a complementary view to structure function relationships to sequence information. We have developed a web site http://www.sblest.org/ and an API of web services that enables users to submit protein structures and identify statistically significant neighbors and the underlying structural environments that make that match using a suite of sequence and structure analysis tools. To do this, we have integrated S-BLEST, PSI-BLAST and HMMer based superfamily predictions to give a unique integrated view to prediction of SCOP superfamilies, EC number, and GO term, as well as identification of the protein structural environments that are associated with that prediction. Additionally, we have extended UCSF Chimera and PyMOL to support our web services, so that users can characterize their own proteins of interest. Results Users are able to submit their own queries or use a structure already in the PDB. Currently the databases that a user can query include the popular structural datasets ASTRAL 40 v1.69, ASTRAL 95 v1.69, CLUSTER50, CLUSTER70 and CLUSTER90 and PDBSELECT25. The results can be downloaded directly from the site and include function prediction, analysis of the most conserved environments and automated annotation of query proteins. These results reflect both the hits found with PSI-BLAST, HMMer and with S-BLEST. We have evaluated how well annotation transfer can be performed on SCOP ID's, Gene Ontology (GO ID's and EC Numbers. The method is very efficient and totally automated, generally taking around fifteen minutes for a 400 residue protein. Conclusion With structural genomics initiatives determining structures with little, if any, functional characterization

  14. Characteristics and Prediction of RNA Structure

    Directory of Open Access Journals (Sweden)

    Hengwu Li

    2014-01-01

    Full Text Available RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA STRAND, which are validated by NMR or X-ray. The length ratios of domains in these sequences are approximately 0.382L, 0.5L, 0.618L, and L, where L is the sequence length. These points are just the important golden sections of sequence. With this characteristic, an algorithm is designed to predict RNA hierarchical structures and simulate RNA folding by dynamically folding RNA structures according to the above golden section points. The sensitivity and number of predicted pseudoknots of our algorithm are better than those of the Mfold, HotKnots, McQfold, ProbKnot, and Lhw-Zhu algorithms. Experimental results reflect the folding rules of RNA from a new angle that is close to natural folding.

  15. Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling : A CASP-CAPRI experiment

    NARCIS (Netherlands)

    Lensink, Marc F.; Velankar, Sameer; Kryshtafovych, Andriy; Huang, Shen You; Schneidman-Duhovny, Dina; Sali, Andrej; Segura, Joan; Fernandez-Fuentes, Narcis; Viswanath, Shruthi; Elber, Ron; Grudinin, Sergei; Popov, Petr; Neveu, Emilie; Lee, Hasup; Baek, Minkyung; Park, Sangwoo; Heo, Lim; Rie Lee, Gyu; Seok, Chaok; Qin, Sanbo; Zhou, Huan Xiang; Ritchie, David W.; Maigret, Bernard; Devignes, Marie Dominique; Ghoorah, Anisah; Torchala, Mieczyslaw; Chaleil, Raphaël A G; Bates, Paul A.; Ben-Zeev, Efrat; Eisenstein, Miriam; Negi, Surendra S.; Weng, Zhiping; Vreven, Thom; Pierce, Brian G.; Borrman, Tyler M.; Yu, Jinchao; Ochsenbein, Françoise; Guerois, Raphaël; Vangone, Anna; Garcia Lopes Maia Rodrigues, João; van Zundert, Gydo; Nellen, Mehdi; Xue, Li; Karaca, Ezgi; Melquiond, Adrien S J; Visscher, Koen; Kastritis, Panagiotis L.; Bonvin, Alexandre M J J; Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Li, Jilong; Ma, Zhiwei; Cheng, Jianlin; Zou, Xiaoqin; Shen, Yang; Peterson, Lenna X.; Kim, Hyung Rae; Roy, Amit; Han, Xusi; Esquivel-Rodriguez, Juan; Kihara, Daisuke; Yu, Xiaofeng; Bruce, Neil J.; Fuller, Jonathan C.; Wade, Rebecca C.; Anishchenko, Ivan; Kundrotas, Petras J.; Vakser, Ilya A.; Imai, Kenichiro; Yamada, Kazunori; Oda, Toshiyuki; Nakamura, Tsukasa; Tomii, Kentaro; Pallara, Chiara; Romero-Durana, Miguel; Jiménez-García, Brian; Moal, Iain H.; Férnandez-Recio, Juan; Joung, Jong Young; Kim, Jong Yun; Joo, Keehyoung; Lee, Jooyoung; Kozakov, Dima; Vajda, Sandor; Mottarella, Scott; Hall, David R.; Beglov, Dmitri; Mamonov, Artem; Xia, Bing; Bohnuud, Tanggis; Del Carpio, Carlos A.; Ichiishi, Eichiro; Marze, Nicholas; Kuroda, Daisuke; Roy Burman, Shourya S.; Gray, Jeffrey J.; Chermak, Edrisse; Cavallo, Luigi; Oliva, Romina; Tovchigrechko, Andrey; Wodak, Shoshana J.

    2016-01-01

    We present the results for CAPRI Round 30, the first joint CASP-CAPRI experiment, which brought together experts from the protein structure prediction and protein-protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014.

  16. Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling: A CASP-CAPRI experiment

    NARCIS (Netherlands)

    Lensink, Marc F.; Velankar, Sameer; Kryshtafovych, Andriy; Huang, Shen You; Schneidman-Duhovny, Dina; Sali, Andrej; Segura, Joan; Fernandez-Fuentes, Narcis; Viswanath, Shruthi; Elber, Ron; Grudinin, Sergei; Popov, Petr; Neveu, Emilie; Lee, Hasup; Baek, Minkyung; Park, Sangwoo; Heo, Lim; Lee, Gyu Rie; Seok, Chaok; Qin, Sanbo; Zhou, Huan Xiang; Ritchie, David W.; Maigret, Bernard; Devignes, Marie Dominique; Ghoorah, Anisah; Torchala, Mieczyslaw; Chaleil, Raphaël A.G.; Bates, Paul A.; Ben-Zeev, Efrat; Eisenstein, Miriam; Negi, Surendra S.; Weng, Zhiping; Vreven, Thom; Pierce, Brian G.; Borrman, Tyler M.; Yu, Jinchao; Ochsenbein, Françoise; Guerois, Raphaël; Vangone, Anna; Rodrigues, João P.G.L.M.; Van Zundert, Gydo; Nellen, Mehdi; Xue, Li; Karaca, Ezgi; Melquiond, Adrien S.J.; Visscher, Koen; Kastritis, Panagiotis L.; Bonvin, Alexandre M.J.J.; Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Li, Jilong; Ma, Zhiwei; Cheng, Jianlin; Zou, Xiaoqin; Shen, Yang; Peterson, Lenna X.; Kim, Hyung Rae; Roy, Amit; Han, Xusi; Esquivel-Rodriguez, Juan; Kihara, Daisuke; Yu, Xiaofeng; Bruce, Neil J.; Fuller, Jonathan C.; Wade, Rebecca C.; Anishchenko, Ivan; Kundrotas, Petras J.; Vakser, Ilya A.; Imai, Kenichiro; Yamada, Kazunori; Oda, Toshiyuki; Nakamura, Tsukasa; Tomii, Kentaro; Pallara, Chiara; Romero-Durana, Miguel; Jiménez-García, Brian; Moal, Iain H.; Férnandez-Recio, Juan; Joung, Jong Young; Kim, Jong Yun; Joo, Keehyoung; Lee, Jooyoung; Kozakov, Dima; Vajda, Sandor; Mottarella, Scott; Hall, David R.; Beglov, Dmitri; Mamonov, Artem; Xia, Bing; Bohnuud, Tanggis; Del Carpio, Carlos A.; Ichiishi, Eichiro; Marze, Nicholas; Kuroda, Daisuke; Roy Burman, Shourya S.; Gray, Jeffrey J.; Chermak, Edrisse; Cavallo, Luigi; Oliva, Romina; Tovchigrechko, Andrey; Wodak, Shoshana J.

    2016-01-01

    We present the results for CAPRI Round 30, the first joint CASP-CAPRI experiment, which brought together experts from the protein structure prediction and protein-protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014.

  17. Predicting beta-turns and their types using predicted backbone dihedral angles and secondary structures.

    Science.gov (United States)

    Kountouris, Petros; Hirst, Jonathan D

    2010-07-31

    Beta-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. We have developed a novel method that predicts beta-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of beta-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of beta-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. We have created an accurate predictor of beta-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/.

  18. Simultaneous determination of protein structure and dynamics

    DEFF Research Database (Denmark)

    Lindorff-Larsen, Kresten; Best, Robert B.; DePristo, M. A.

    2005-01-01

    at the atomic level about the structural and dynamical features of proteins-with the ability of molecular dynamics simulations to explore a wide range of protein conformations. We illustrate the method for human ubiquitin in solution and find that there is considerable conformational heterogeneity throughout......We present a protocol for the experimental determination of ensembles of protein conformations that represent simultaneously the native structure and its associated dynamics. The procedure combines the strengths of nuclear magnetic resonance spectroscopy-for obtaining experimental information...... the protein structure. The interior atoms of the protein are tightly packed in each individual conformation that contributes to the ensemble but their overall behaviour can be described as having a significant degree of liquid-like character. The protocol is completely general and should lead to significant...

  19. Predictable tuning of protein expression in bacteria

    DEFF Research Database (Denmark)

    Bonde, Mads; Pedersen, Margit; Klausen, Michael Schantz

    2016-01-01

    We comprehensively assessed the contribution of the Shine-Dalgarno sequence to protein expression and used the data to develop EMOPEC (Empirical Model and Oligos for Protein Expression Changes; http://emopec.biosustain.dtu.dk). EMOPEC is a free tool that makes it possible to modulate the expressi...

  20. Signal peptides and protein localization prediction

    DEFF Research Database (Denmark)

    Nielsen, Henrik

    2005-01-01

    In 1999, the Nobel prize in Physiology or Medicine was awarded to Gunther Blobel “for the discovery that proteins have intrinsic signals that govern their transport and localization in the cell”. Since the subcellular localization of a protein is an important clue to its function, the characteriz...

  1. Facilitating RNA structure prediction with microarrays.

    Science.gov (United States)

    Kierzek, Elzbieta; Kierzek, Ryszard; Turner, Douglas H; Catrina, Irina E

    2006-01-17

    Determining RNA secondary structure is important for understanding structure-function relationships and identifying potential drug targets. This paper reports the use of microarrays with heptamer 2'-O-methyl oligoribonucleotides to probe the secondary structure of an RNA and thereby improve the prediction of that secondary structure. When experimental constraints from hybridization results are added to a free-energy minimization algorithm, the prediction of the secondary structure of Escherichia coli 5S rRNA improves from 27 to 92% of the known canonical base pairs. Optimization of buffer conditions for hybridization and application of 2'-O-methyl-2-thiouridine to enhance binding and improve discrimination between AU and GU pairs are also described. The results suggest that probing RNA with oligonucleotide microarrays can facilitate determination of secondary structure.

  2. Protein Structure and the Sequential Structure of mRNA

    DEFF Research Database (Denmark)

    Brunak, Søren; Engelbrecht, Jacob

    1996-01-01

    entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment, By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets, These signals do not originate from......A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed, We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting...... protein, The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain, A complete search for GenBank nucleotide sequences coding for structural...

  3. The construction of an amino acid network for understanding protein structure and function.

    Science.gov (United States)

    Yan, Wenying; Zhou, Jianhong; Sun, Maomin; Chen, Jiajia; Hu, Guang; Shen, Bairong

    2014-06-01

    Amino acid networks (AANs) are undirected networks consisting of amino acid residues and their interactions in three-dimensional protein structures. The analysis of AANs provides novel insight into protein science, and several common amino acid network properties have revealed diverse classes of proteins. In this review, we first summarize methods for the construction and characterization of AANs. We then compare software tools for the construction and analysis of AANs. Finally, we review the application of AANs for understanding protein structure and function, including the identification of functional residues, the prediction of protein folding, analyzing protein stability and protein-protein interactions, and for understanding communication within and between proteins.

  4. Overcoming barriers to membrane protein structure determination

    NARCIS (Netherlands)

    Bill, Roslyn M.; Henderson, Peter J. F.; Iwata, So; Kunji, Edmund R. S.; Michel, Hartmut; Neutze, Richard; Newstead, Simon; Poolman, Bert; Tate, Christopher G.; Vogel, Horst

    After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new

  5. RNA folding: structure prediction, folding kinetics and ion electrostatics.

    Science.gov (United States)

    Tan, Zhijie; Zhang, Wenbing; Shi, Yazhou; Wang, Fenghua

    2015-01-01

    Beyond the "traditional" functions such as gene storage, transport and protein synthesis, recent discoveries reveal that RNAs have important "new" biological functions including the RNA silence and gene regulation of riboswitch. Such functions of noncoding RNAs are strongly coupled to the RNA structures and proper structure change, which naturally leads to the RNA folding problem including structure prediction and folding kinetics. Due to the polyanionic nature of RNAs, RNA folding structure, stability and kinetics are strongly coupled to the ion condition of solution. The main focus of this chapter is to review the recent progress in the three major aspects in RNA folding problem: structure prediction, folding kinetics and ion electrostatics. This chapter will introduce both the recent experimental and theoretical progress, while emphasize the theoretical modelling on the three aspects in RNA folding.

  6. The 82-plex plasma protein signature that predicts increasing inflammation

    DEFF Research Database (Denmark)

    Tepel, Martin; Beck, Hans C; Tan, Qihua

    2015-01-01

    The objective of the study was to define the specific plasma protein signature that predicts the increase of the inflammation marker C-reactive protein from index day to next-day using proteome analysis and novel bioinformatics tools. We performed a prospective study of 91 incident kidney....... The prediction model selected and validated 82 plasma proteins which determined increased next-day C-reactive protein (area under receiver-operator-characteristics curve, 0.772; 95% confidence interval, 0.669 to 0.876; P signature (P ....001) was associated with observed increased next-day C-reactive protein. The 82-plex protein signature outperformed routine clinical procedures. The category-free net reclassification index improved with 82-plex plasma protein signature (total net reclassification index, 88.3%). Using the 82-plex plasma protein...

  7. A 'periodic table' for protein structures.

    Science.gov (United States)

    Taylor, William R

    2002-04-11

    Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.

  8. Structural analysis of recombinant human protein QM

    International Nuclear Information System (INIS)

    Gualberto, D.C.H.; Fernandes, J.L.; Silva, F.S.; Saraiva, K.W.; Affonso, R.; Pereira, L.M.; Silva, I.D.C.G.

    2012-01-01

    Full text: The ribosomal protein QM belongs to a family of ribosomal proteins, which is highly conserved from yeast to humans. The presence of the QM protein is necessary for joining the 60S and 40S subunits in a late step of the initiation of mRNA translation. Although the exact extra-ribosomal functions of QM are not yet fully understood, it has been identified as a putative tumor suppressor. This protein was reported to interact with the transcription factor c-Jun and thereby prevent c-Jun actives genes of the cellular growth. In this study, the human QM protein was expressed in bacterial system, in the soluble form and this structure was analyzed by Circular Dichroism and Fluorescence. The results of Circular Dichroism showed that this protein has less alpha helix than beta sheet, as described in the literature. QM protein does not contain a leucine zipper region; however the ion zinc is necessary for binding of QM to c-Jun. Then we analyzed the relationship between the removal of zinc ions and folding of protein. Preliminary results obtained by the technique Fluorescence showed a gradual increase in fluorescence with the addition of increasing concentration of EDTA. This suggests that the zinc is important in the tertiary structure of the protein. More studies are being made for better understand these results. (author)

  9. A novel representation for apoptosis protein subcellular localization prediction using support vector machine.

    Science.gov (United States)

    Zhang, Li; Liao, Bo; Li, Dachao; Zhu, Wen

    2009-07-21

    Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test.

  10. Linking structural features of protein complexes and biological function.

    Science.gov (United States)

    Sowmya, Gopichandran; Breen, Edmond J; Ranganathan, Shoba

    2015-09-01

    Protein-protein interaction (PPI) establishes the central basis for complex cellular networks in a biological cell. Association of proteins with other proteins occurs at varying affinities, yet with a high degree of specificity. PPIs lead to diverse functionality such as catalysis, regulation, signaling, immunity, and inhibition, playing a crucial role in functional genomics. The molecular principle of such interactions is often elusive in nature. Therefore, a comprehensive analysis of known protein complexes from the Protein Data Bank (PDB) is essential for the characterization of structural interface features to determine structure-function relationship. Thus, we analyzed a nonredundant dataset of 278 heterodimer protein complexes, categorized into major functional classes, for distinguishing features. Interestingly, our analysis has identified five key features (interface area, interface polar residue abundance, hydrogen bonds, solvation free energy gain from interface formation, and binding energy) that are discriminatory among the functional classes using Kruskal-Wallis rank sum test. Significant correlations between these PPI interface features amongst functional categories are also documented. Salt bridges correlate with interface area in regulator-inhibitors (r = 0.75). These representative features have implications for the prediction of potential function of novel protein complexes. The results provide molecular insights for better understanding of PPIs and their relation to biological functions. © 2015 The Protein Society.

  11. Protein Structure Determination Using Chemical Shifts

    DEFF Research Database (Denmark)

    Christensen, Anders Steen

    is determined using only chemical shifts recorded and assigned through automated processes. The CARMSD to the experimental X-ray for this structure is 1.1. Å. Additionally, the method is combined with very sparse NOE-restraints and evolutionary distance restraints and tested on several protein structures >100...

  12. On characterization of anisotropic plant protein structures

    NARCIS (Netherlands)

    Krintiras, G.A.; Göbel, J.; Bouwman, W.G.; Goot, van der A.J.; Stefanidis, G.D.

    2014-01-01

    In this paper, a set of complementary techniques was used to characterize surface and bulk structures of an anisotropic Soy Protein Isolate (SPI)–vital wheat gluten blend after it was subjected to heat and simple shear flow in a Couette Cell. The structured biopolymer blend can form a basis for a

  13. Hidden Structural Codes in Protein Intrinsic Disorder.

    Science.gov (United States)

    Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo

    2017-10-17

    Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.

  14. Relationship between Molecular Structure Characteristics of Feed Proteins and Protein Digestibility and Solubility

    Directory of Open Access Journals (Sweden)

    Mingmei Bai

    2016-08-01

    Full Text Available The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller’s dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003; moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004. On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (p<0.001 and solubility (p = 0.002. These results demonstrate that the molecular structure characteristics of feed proteins are closely related to their in vitro digestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins.

  15. An Overview of the Prediction of Protein DNA-Binding Sites

    Directory of Open Access Journals (Sweden)

    Jingna Si

    2015-03-01

    Full Text Available Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.

  16. An overview of the prediction of protein DNA-binding sites.

    Science.gov (United States)

    Si, Jingna; Zhao, Rui; Wu, Rongling

    2015-03-06

    Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.

  17. Disorder Prediction Methods, Their Applicability to Different Protein Targets and Their Usefulness for Guiding Experimental Studies

    Directory of Open Access Journals (Sweden)

    Jennifer D. Atkins

    2015-08-01

    Full Text Available The role and function of a given protein is dependent on its structure. In recent years, however, numerous studies have highlighted the importance of unstructured, or disordered regions in governing a protein’s function. Disordered proteins have been found to play important roles in pivotal cellular functions, such as DNA binding and signalling cascades. Studying proteins with extended disordered regions is often problematic as they can be challenging to express, purify and crystallise. This means that interpretable experimental data on protein disorder is hard to generate. As a result, predictive computational tools have been developed with the aim of predicting the level and location of disorder within a protein. Currently, over 60 prediction servers exist, utilizing different methods for classifying disorder and different training sets. Here we review several good performing, publicly available prediction methods, comparing their application and discussing how disorder prediction servers can be used to aid the experimental solution of protein structure. The use of disorder prediction methods allows us to adopt a more targeted approach to experimental studies by accurately identifying the boundaries of ordered protein domains so that they may be investigated separately, thereby increasing the likelihood of their successful experimental solution.

  18. A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction.

    Science.gov (United States)

    Deng, Lei; Fan, Chao; Zeng, Zhiwen

    2017-12-28

    Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.

  19. Toxicological relationships between proteins obtained from protein target predictions of large toxicity databases

    International Nuclear Information System (INIS)

    Nigsch, Florian; Mitchell, John B.O.

    2008-01-01

    The combination of models for protein target prediction with large databases containing toxicological information for individual molecules allows the derivation of 'toxiclogical' profiles, i.e., to what extent are molecules of known toxicity predicted to interact with a set of protein targets. To predict protein targets of drug-like and toxic molecules, we built a computational multiclass model using the Winnow algorithm based on a dataset of protein targets derived from the MDL Drug Data Report. A 15-fold Monte Carlo cross-validation using 50% of each class for training, and the remaining 50% for testing, provided an assessment of the accuracy of that model. We retained the 3 top-ranking predictions and found that in 82% of all cases the correct target was predicted within these three predictions. The first prediction was the correct one in almost 70% of cases. A model built on the whole protein target dataset was then used to predict the protein targets for 150 000 molecules from the MDL Toxicity Database. We analysed the frequency of the predictions across the panel of protein targets for experimentally determined toxicity classes of all molecules. This allowed us to identify clusters of proteins related by their toxicological profiles, as well as toxicities that are related. Literature-based evidence is provided for some specific clusters to show the relevance of the relationships identified

  20. Predicting beta-turns in proteins using support vector machines with fractional polynomials.

    Science.gov (United States)

    Elbashir, Murtada; Wang, Jianxin; Wu, Fang-Xiang; Wang, Lusheng

    2013-11-07

    β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.

  1. Protein complex prediction based on k-connected subgraphs in protein interaction network

    Directory of Open Access Journals (Sweden)

    Habibi Mahnaz

    2010-09-01

    Full Text Available Abstract Background Protein complexes play an important role in cellular mechanisms. Recently, several methods have been presented to predict protein complexes in a protein interaction network. In these methods, a protein complex is predicted as a dense subgraph of protein interactions. However, interactions data are incomplete and a protein complex does not have to be a complete or dense subgraph. Results We propose a more appropriate protein complex prediction method, CFA, that is based on connectivity number on subgraphs. We evaluate CFA using several protein interaction networks on reference protein complexes in two benchmark data sets (MIPS and Aloy, containing 1142 and 61 known complexes respectively. We compare CFA to some existing protein complex prediction methods (CMC, MCL, PCP and RNSC in terms of recall and precision. We show that CFA predicts more complexes correctly at a competitive level of precision. Conclusions Many real complexes with different connectivity level in protein interaction network can be predicted based on connectivity number. Our CFA program and results are freely available from http://www.bioinf.cs.ipm.ir/softwares/cfa/CFA.rar.

  2. PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs

    Directory of Open Access Journals (Sweden)

    Greenblatt Jack

    2006-07-01

    Full Text Available Abstract Background Identification of protein interaction networks has received considerable attention in the post-genomic era. The currently available biochemical approaches used to detect protein-protein interactions are all time and labour intensive. Consequently there is a growing need for the development of computational tools that are capable of effectively identifying such interactions. Results Here we explain the development and implementation of a novel Protein-Protein Interaction Prediction Engine termed PIPE. This tool is capable of predicting protein-protein interactions for any target pair of the yeast Saccharomyces cerevisiae proteins from their primary structure and without the need for any additional information or predictions about the proteins. PIPE showed a sensitivity of 61% for detecting any yeast protein interaction with 89% specificity and an overall accuracy of 75%. This rate of success is comparable to those associated with the most commonly used biochemical techniques. Using PIPE, we identified a novel interaction between YGL227W (vid30 and YMR135C (gid8 yeast proteins. This lead us to the identification of a novel yeast complex that here we term vid30 complex (vid30c. The observed interaction was confirmed by tandem affinity purification (TAP tag, verifying the ability of PIPE to predict novel protein-protein interactions. We then used PIPE analysis to investigate the internal architecture of vid30c. It appeared from PIPE analysis that vid30c may consist of a core and a secondary component. Generation of yeast gene deletion strains combined with TAP tagging analysis indicated that the deletion of a member of the core component interfered with the formation of vid30c, however, deletion of a member of the secondary component had little effect (if any on the formation of vid30c. Also, PIPE can be used to analyse yeast proteins for which TAP tagging fails, thereby allowing us to predict protein interactions that are not

  3. Models of protein-ligand crystal structures: trust, but verify.

    Science.gov (United States)

    Deller, Marc C; Rupp, Bernhard

    2015-09-01

    X-ray crystallography provides the most accurate models of protein-ligand structures. These models serve as the foundation of many computational methods including structure prediction, molecular modelling, and structure-based drug design. The success of these computational methods ultimately depends on the quality of the underlying protein-ligand models. X-ray crystallography offers the unparalleled advantage of a clear mathematical formalism relating the experimental data to the protein-ligand model. In the case of X-ray crystallography, the primary experimental evidence is the electron density of the molecules forming the crystal. The first step in the generation of an accurate and precise crystallographic model is the interpretation of the electron density of the crystal, typically carried out by construction of an atomic model. The atomic model must then be validated for fit to the experimental electron density and also for agreement with prior expectations of stereochemistry. Stringent validation of protein-ligand models has become possible as a result of the mandatory deposition of primary diffraction data, and many computational tools are now available to aid in the validation process. Validation of protein-ligand complexes has revealed some instances of overenthusiastic interpretation of ligand density. Fundamental concepts and metrics of protein-ligand quality validation are discussed and we highlight software tools to assist in this process. It is essential that end users select high quality protein-ligand models for their computational and biological studies, and we provide an overview of how this can be achieved.

  4. Structural deformation upon protein-protein interaction: a structural alphabet approach.

    Science.gov (United States)

    Martin, Juliette; Regad, Leslie; Lecornet, Hélène; Camproux, Anne-Claude

    2008-02-28

    In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Our study provides qualitative information about induced fit. These results could be of help for flexible docking.

  5. Structural deformation upon protein-protein interaction: A structural alphabet approach

    Directory of Open Access Journals (Sweden)

    Lecornet Hélène

    2008-02-01

    Full Text Available Abstract Background In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. Results In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%. This proportion is even greater in the interface regions (41%. Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Conclusion Our study provides qualitative information about induced fit. These results could be of help for flexible docking.

  6. Quality assessment of protein model-structures based on structural and functional similarities.

    Science.gov (United States)

    Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata

    2012-09-21

    Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and

  7. Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.

    Science.gov (United States)

    Liu, Zhi-Ping; Chen, Luonan

    2016-01-01

    Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.

  8. Automatic structure classification of small proteins using random forest

    Directory of Open Access Journals (Sweden)

    Hirst Jonathan D

    2010-07-01

    Full Text Available Abstract Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs. An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP Class, Fold, Super-family or Family levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true Class, Fold, Super-family or Family levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.

  9. Predicting protein-binding RNA nucleotides with consideration of binding partners.

    Science.gov (United States)

    Tuvshinjargal, Narankhuu; Lee, Wook; Park, Byungkyu; Han, Kyungsook

    2015-06-01

    most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  10. RNA secondary structure prediction using soft computing.

    Science.gov (United States)

    Ray, Shubhra Sankar; Pal, Sankar K

    2013-01-01

    Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.

  11. Beta-structures in fibrous proteins.

    Science.gov (United States)

    Kajava, Andrey V; Squire, John M; Parry, David A D

    2006-01-01

    The beta-form of protein folding, one of the earliest protein structures to be defined, was originally observed in studies of silks. It was then seen in early studies of synthetic polypeptides and, of course, is now known to be present in a variety of guises as an essential component of globular protein structures. However, in the last decade or so it has become clear that the beta-conformation of chains is present not only in many of the amyloid structures associated with, for example, Alzheimer's Disease, but also in the prion structures associated with the spongiform encephalopathies. Furthermore, X-ray crystallography studies have revealed the high incidence of the beta-fibrous proteins among virulence factors of pathogenic bacteria and viruses. Here we describe the basic forms of the beta-fold, summarize the many different new forms of beta-structural fibrous arrangements that have been discovered, and review advances in structural studies of amyloid and prion fibrils. These and other issues are described in detail in later chapters.

  12. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  13. A large-scale evaluation of computational protein function prediction

    NARCIS (Netherlands)

    Radivojac, P.; Clark, W.T.; Oron, T.R.; Schnoes, A.M.; Wittkop, T.; Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Friedberg, I.

    2013-01-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be

  14. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  15. Prediction of protein conformational freedom from distance constraints

    NARCIS (Netherlands)

    de Groot, B.L.; van Aalten, D.M.F.; Scheek, R.M.; Amadei, A; Vriend, G.; Berendsen, H.J.C.

    1997-01-01

    A method is presented that generates random protein structures that fulfil a set of upper and lower interatomic distance limits, These limits depend on distances measured in experimental structures and the strength of the interatomic interaction, Structural differences between generated structures

  16. Prediction of protein-protein interactions between viruses and human by an SVM model

    Directory of Open Access Journals (Sweden)

    Cui Guangyu

    2012-05-01

    Full Text Available Abstract Background Several computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species. Results We developed a new method for representing a protein sequence of variable length in a frequency vector of fixed length, which encodes the relative frequency of three consecutive amino acids of a sequence. We built a support vector machine (SVM model to predict human proteins that interact with virus proteins. In two types of viruses, human papillomaviruses (HPV and hepatitis C virus (HCV, our SVM model achieved an average accuracy above 80%, which is higher than that of another SVM model with a different representation scheme. Using the SVM model and Gene Ontology (GO annotations of proteins, we predicted new interactions between virus proteins and human proteins. Conclusions Encoding the relative frequency of amino acid triplets of a protein sequence is a simple yet powerful representation method for predicting protein-protein interactions across different species. The representation method has several advantages: (1 it enables a prediction model to achieve a better performance than other representations, (2 it generates feature vectors of fixed length regardless of the sequence length, and (3 the same representation is applicable to different types of proteins.

  17. Fibrous Protein Structures: Hierarchy, History and Heroes.

    Science.gov (United States)

    Squire, John M; Parry, David A D

    2017-01-01

    During the 1930s and 1940s the technique of X-ray diffraction was applied widely by William Astbury and his colleagues to a number of naturally-occurring fibrous materials. On the basis of the diffraction patterns obtained, he observed that the structure of each of the fibres was dominated by one of a small number of different types of molecular conformation. One group of fibres, known as the k-m-e-f group of proteins (keratin - myosin - epidermin - fibrinogen), gave rise to diffraction characteristics that became known as the α-pattern. Others, such as those from a number of silks, gave rise to a different pattern - the β-pattern, while connective tissues yielded a third unique set of diffraction characteristics. At the time of Astbury's work, the structures of these materials were unknown, though the spacings of the main X-ray reflections gave an idea of the axial repeats and the lateral packing distances. In a breakthrough in the early 1950s, the basic structures of all of these fibrous proteins were determined. It was found that the long protein chains, composed of strings of amino acids, could be folded up in a systematic manner to generate a limited number of structures that were consistent with the X-ray data. The most important of these were known as the α-helix, the β-sheet, and the collagen triple helix. These studies provided information about the basic building blocks of all proteins, both fibrous and globular. They did not, however, provide detailed information about how these molecules packed together in three-dimensions to generate the fibres found in vivo. A number of possible packing arrangements were subsequently deduced from the X-ray diffraction and other data, but it is only in the last few years, through the continued improvements of electron microscopy, that the packing details within some fibrous proteins can now be seen directly. Here we outline briefly some of the milestones in fibrous protein structure determination, the role of the

  18. Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation

    Directory of Open Access Journals (Sweden)

    Wang Yong

    2011-10-01

    Full Text Available Abstract Background With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. Results In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. Conclusions By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.

  19. Improving protein function prediction methods with integrated literature data

    Directory of Open Access Journals (Sweden)

    Gabow Aaron P

    2008-04-01

    Full Text Available Abstract Background Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder

  20. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

    Directory of Open Access Journals (Sweden)

    Ruan Jishou

    2007-04-01

    Full Text Available Abstract Background Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP; the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. Results The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are

  1. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts

    Energy Technology Data Exchange (ETDEWEB)

    Shen Yang; Delaglio, Frank [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States); Cornilescu, Gabriel [National Magnetic Resonance Facility (United States); Bax, Ad [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)], E-mail: bax@nih.gov

    2009-08-15

    NMR chemical shifts in proteins depend strongly on local structure. The program TALOS establishes an empirical relation between {sup 13}C, {sup 15}N and {sup 1}H chemical shifts and backbone torsion angles {phi} and {psi} (Cornilescu et al. J Biomol NMR 13 289-302, 1999). Extension of the original 20-protein database to 200 proteins increased the fraction of residues for which backbone angles could be predicted from 65 to 74%, while reducing the error rate from 3 to 2.5%. Addition of a two-layer neural network filter to the database fragment selection process forms the basis for a new program, TALOS+, which further enhances the prediction rate to 88.5%, without increasing the error rate. Excluding the 2.5% of residues for which TALOS+ makes predictions that strongly differ from those observed in the crystalline state, the accuracy of predicted {phi} and {psi} angles, equals {+-}13{sup o}. Large discrepancies between predictions and crystal structures are primarily limited to loop regions, and for the few cases where multiple X-ray structures are available such residues are often found in different states in the different structures. The TALOS+ output includes predictions for individual residues with missing chemical shifts, and the neural network component of the program also predicts secondary structure with good accuracy.

  2. Application of long-range order to predict unfolding rates of two-state proteins.

    Science.gov (United States)

    Harihar, B; Selvaraj, S

    2011-03-01

    Predicting the experimental unfolding rates of two-state proteins and models describing the unfolding rates of these proteins is quite limited because of the complexity present in the unfolding mechanism and the lack of experimental unfolding data compared with folding data. In this work, 25 two-state proteins characterized by Maxwell et al. (Protein Sci 2005;14:602–616) using a consensus set of experimental conditions were taken, and the parameter long-range order (LRO) derived from their three-dimensional structures were related with their experimental unfolding rates ln(k(u)). From the total data set of 30 proteins used by Maxwell et al. (Protein Sci 2005;14:602–616), five slow-unfolding proteins with very low unfolding rates were considered to be outliers and were not included in our data set. Except all beta structural class, LRO of both the all-alpha and mixed-class proteins showed a strong inverse correlation of r = -0.99 and -0.88, respectively, with experimental ln(k(u)). LRO shows a correlation of -0.62 with experimental ln(k(u)) for all-beta proteins. For predicting the unfolding rates, a simple statistical method has been used and linear regression equations were developed for individual structural classes of proteins using LRO, and the results obtained showed a better agreement with experimental results. Copyright © 2010 Wiley-Liss, Inc.

  3. Antibody structural modeling with prediction of immunoglobulin structure (PIGS)

    DEFF Research Database (Denmark)

    Marcatili, Paolo; Olimpieri, Pier Paolo; Chailyan, Anna

    2014-01-01

    Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful...... applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (∼10 min...... on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together....

  4. Antibody structural modeling with prediction of immunoglobulin structure (PIGS)

    KAUST Repository

    Marcatili, Paolo

    2014-11-06

    © 2014 Nature America, Inc. All rights reserved. Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (~10 min on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together.

  5. Structure of the ordered hydration of amino acids in proteins: analysis of crystal structures

    Energy Technology Data Exchange (ETDEWEB)

    Biedermannová, Lada, E-mail: lada.biedermannova@ibt.cas.cz; Schneider, Bohdan [Institute of Biotechnology CAS, Videnska 1083, 142 20 Prague (Czech Republic)

    2015-10-27

    The hydration of protein crystal structures was studied at the level of individual amino acids. The dependence of the number of water molecules and their preferred spatial localization on various parameters, such as solvent accessibility, secondary structure and side-chain conformation, was determined. Crystallography provides unique information about the arrangement of water molecules near protein surfaces. Using a nonredundant set of 2818 protein crystal structures with a resolution of better than 1.8 Å, the extent and structure of the hydration shell of all 20 standard amino-acid residues were analyzed as function of the residue conformation, secondary structure and solvent accessibility. The results show how hydration depends on the amino-acid conformation and the environment in which it occurs. After conformational clustering of individual residues, the density distribution of water molecules was compiled and the preferred hydration sites were determined as maxima in the pseudo-electron-density representation of water distributions. Many hydration sites interact with both main-chain and side-chain amino-acid atoms, and several occurrences of hydration sites with less canonical contacts, such as carbon–donor hydrogen bonds, OH–π interactions and off-plane interactions with aromatic heteroatoms, are also reported. Information about the location and relative importance of the empirically determined preferred hydration sites in proteins has applications in improving the current methods of hydration-site prediction in molecular replacement, ab initio protein structure prediction and the set-up of molecular-dynamics simulations.

  6. Rationally designed synthetic protein hydrogels with predictable mechanical properties.

    Science.gov (United States)

    Wu, Junhua; Li, Pengfei; Dong, Chenling; Jiang, Heting; Bin Xue; Gao, Xiang; Qin, Meng; Wang, Wei; Bin Chen; Cao, Yi

    2018-02-12

    Designing synthetic protein hydrogels with tailored mechanical properties similar to naturally occurring tissues is an eternal pursuit in tissue engineering and stem cell and cancer research. However, it remains challenging to correlate the mechanical properties of protein hydrogels with the nanomechanics of individual building blocks. Here we use single-molecule force spectroscopy, protein engineering and theoretical modeling to prove that the mechanical properties of protein hydrogels are predictable based on the mechanical hierarchy of the cross-linkers and the load-bearing modules at the molecular level. These findings provide a framework for rationally designing protein hydrogels with independently tunable elasticity, extensibility, toughness and self-healing. Using this principle, we demonstrate the engineering of self-healable muscle-mimicking hydrogels that can significantly dissipate energy through protein unfolding. We expect that this principle can be generalized for the construction of protein hydrogels with customized mechanical properties for biomedical applications.

  7. Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be.

    KAUST Repository

    Schaefer, Christian

    2010-01-16

    MOTIVATION: The mutation of amino acids often impacts protein function and structure. Mutations without negative effect sustain evolutionary pressure. We study a particular aspect of structural robustness with respect to mutations: regular protein secondary structure and natively unstructured (intrinsically disordered) regions. Is the formation of regular secondary structure an intrinsic feature of amino acid sequences, or is it a feature that is lost upon mutation and is maintained by evolution against the odds? Similarly, is disorder an intrinsic sequence feature or is it difficult to maintain? To tackle these questions, we in silico mutated native protein sequences into random sequence-like ensembles and monitored the change in predicted secondary structure and disorder. RESULTS: We established that by our coarse-grained measures for change, predictions and observations were similar, suggesting that our results were not biased by prediction mistakes. Changes in secondary structure and disorder predictions were linearly proportional to the change in sequence. Surprisingly, neither the content nor the length distribution for the predicted secondary structure changed substantially. Regions with long disorder behaved differently in that significantly fewer such regions were predicted after a few mutation steps. Our findings suggest that the formation of regular secondary structure is an intrinsic feature of random amino acid sequences, while the formation of long-disordered regions is not an intrinsic feature of proteins with disordered regions. Put differently, helices and strands appear to be maintained easily by evolution, whereas maintaining disordered regions appears difficult. Neutral mutations with respect to disorder are therefore very unlikely.

  8. Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be.

    KAUST Repository

    Schaefer, Christian; Schlessinger, Avner; Rost, Burkhard

    2010-01-01

    MOTIVATION: The mutation of amino acids often impacts protein function and structure. Mutations without negative effect sustain evolutionary pressure. We study a particular aspect of structural robustness with respect to mutations: regular protein secondary structure and natively unstructured (intrinsically disordered) regions. Is the formation of regular secondary structure an intrinsic feature of amino acid sequences, or is it a feature that is lost upon mutation and is maintained by evolution against the odds? Similarly, is disorder an intrinsic sequence feature or is it difficult to maintain? To tackle these questions, we in silico mutated native protein sequences into random sequence-like ensembles and monitored the change in predicted secondary structure and disorder. RESULTS: We established that by our coarse-grained measures for change, predictions and observations were similar, suggesting that our results were not biased by prediction mistakes. Changes in secondary structure and disorder predictions were linearly proportional to the change in sequence. Surprisingly, neither the content nor the length distribution for the predicted secondary structure changed substantially. Regions with long disorder behaved differently in that significantly fewer such regions were predicted after a few mutation steps. Our findings suggest that the formation of regular secondary structure is an intrinsic feature of random amino acid sequences, while the formation of long-disordered regions is not an intrinsic feature of proteins with disordered regions. Put differently, helices and strands appear to be maintained easily by evolution, whereas maintaining disordered regions appears difficult. Neutral mutations with respect to disorder are therefore very unlikely.

  9. Protein 3D structure computed from evolutionary sequence variation.

    Directory of Open Access Journals (Sweden)

    Debora S Marks

    Full Text Available The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org. This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of

  10. Roles for text mining in protein function prediction.

    Science.gov (United States)

    Verspoor, Karin M

    2014-01-01

    The Human Genome Project has provided science with a hugely valuable resource: the blueprints for life; the specification of all of the genes that make up a human. While the genes have all been identified and deciphered, it is proteins that are the workhorses of the human body: they are essential to virtually all cell functions and are the primary mechanism through which biological function is carried out. Hence in order to fully understand what happens at a molecular level in biological organisms, and eventually to enable development of treatments for diseases where some aspect of a biological system goes awry, we must understand the functions of proteins. However, experimental characterization of protein function cannot scale to the vast amount of DNA sequence data now available. Computational protein function prediction has therefore emerged as a problem at the forefront of modern biology (Radivojac et al., Nat Methods 10(13):221-227, 2013).Within the varied approaches to computational protein function prediction that have been explored, there are several that make use of biomedical literature mining. These methods take advantage of information in the published literature to associate specific proteins with specific protein functions. In this chapter, we introduce two main strategies for doing this: association of function terms, represented as Gene Ontology terms (Ashburner et al., Nat Genet 25(1):25-29, 2000), to proteins based on information in published articles, and a paradigm called LEAP-FS (Literature-Enhanced Automated Prediction of Functional Sites) in which literature mining is used to validate the predictions of an orthogonal computational protein function prediction method.

  11. TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers

    Science.gov (United States)

    Cao, Han; Ng, Marcus C. K.; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W. I.

    2017-09-01

    α-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD PHP, MySQL and Apache, with all major browsers supported.

  12. CONFOLD2: improved contact-driven ab initio protein structure modeling.

    Science.gov (United States)

    Adhikari, Badri; Cheng, Jianlin

    2018-01-25

    Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/ .

  13. Exploring the universe of protein structures beyond the Protein Data Bank.

    Science.gov (United States)

    Cossio, Pilar; Trovato, Antonio; Pietrucci, Fabio; Seno, Flavio; Maritan, Amos; Laio, Alessandro

    2010-11-04

    It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds.

  14. Exploring the universe of protein structures beyond the Protein Data Bank.

    Directory of Open Access Journals (Sweden)

    Pilar Cossio

    Full Text Available It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds.

  15. 3D bioprinting of structural proteins.

    Science.gov (United States)

    Włodarczyk-Biegun, Małgorzata K; Del Campo, Aránzazu

    2017-07-01

    3D bioprinting is a booming method to obtain scaffolds of different materials with predesigned and customized morphologies and geometries. In this review we focus on the experimental strategies and recent achievements in the bioprinting of major structural proteins (collagen, silk, fibrin), as a particularly interesting technology to reconstruct the biochemical and biophysical composition and hierarchical morphology of natural scaffolds. The flexibility in molecular design offered by structural proteins, combined with the flexibility in mixing, deposition, and mechanical processing inherent to bioprinting technologies, enables the fabrication of highly functional scaffolds and tissue mimics with a degree of complexity and organization which has only just started to be explored. Here we describe the printing parameters and physical (mechanical) properties of bioinks based on structural proteins, including the biological function of the printed scaffolds. We describe applied printing techniques and cross-linking methods, highlighting the modifications implemented to improve scaffold properties. The used cell types, cell viability, and possible construct applications are also reported. We envision that the application of printing technologies to structural proteins will enable unprecedented control over their supramolecular organization, conferring printed scaffolds biological properties and functions close to natural systems. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Using support vector machine to predict beta- and gamma-turns in proteins.

    Science.gov (United States)

    Hu, Xiuzhen; Li, Qianzhong

    2008-09-01

    By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided. (c) 2008 Wiley Periodicals, Inc.

  17. The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction.

    Science.gov (United States)

    Roche, Daniel B; Buenavista, Maria T; Tetchner, Stuart J; McGuffin, Liam J

    2011-07-01

    The IntFOLD server is a novel independent server that integrates several cutting edge methods for the prediction of structure and function from sequence. Our guiding principles behind the server development were as follows: (i) to provide a simple unified resource that makes our prediction software accessible to all and (ii) to produce integrated output for predictions that can be easily interpreted. The output for predictions is presented as a simple table that summarizes all results graphically via plots and annotated 3D models. The raw machine readable data files for each set of predictions are also provided for developers, which comply with the Critical Assessment of Methods for Protein Structure Prediction (CASP) data standards. The server comprises an integrated suite of five novel methods: nFOLD4, for tertiary structure prediction; ModFOLD 3.0, for model quality assessment; DISOclust 2.0, for disorder prediction; DomFOLD 2.0 for domain prediction; and FunFOLD 1.0, for ligand binding site prediction. Predictions from the IntFOLD server were found to be competitive in several categories in the recent CASP9 experiment. The IntFOLD server is available at the following web site: http://www.reading.ac.uk/bioinf/IntFOLD/.

  18. Structure prediction of AlnOm clusters

    International Nuclear Information System (INIS)

    Smok, P

    2011-01-01

    Genetic algorithm simulations, using Buckingham potential to represent the anion-anion and cation-anion short-range interactions, were performed in order to predict the equilibrium positions of the Al and O ions in Al n O m clusters. In order to find the equilibrium structures of compounds a self-organizing genetic algorithm were constructed. The calculation were carried out for several clusters Al n O m , with different numbers of aluminium and oxygen atoms.

  19. Functions and structures of eukaryotic recombination proteins

    International Nuclear Information System (INIS)

    Ogawa, Tomoko

    1994-01-01

    We have found that Rad51 and RecA Proteins form strikingly similar structures together with dsDNA and ATP. Their right handed helical nucleoprotein filaments extend the B-form DNA double helixes to 1.5 times in length and wind the helix. The similarity and uniqueness of their structures must reflect functional homologies between these proteins. Therefore, it is highly probable that similar recombination proteins are present in various organisms of different evolutional states. We have succeeded to clone RAD51 genes from human, mouse, chicken and fission yeast genes, and found that the homologues are widely distributed in eukaryotes. The HsRad51 and MmRad51 or ChRad51 proteins consist of 339 amino acids differing only by 4 or 12 amino acids, respectively, and highly homologous to both yeast proteins, but less so to Dmcl. All of these proteins are homologous to the region from residues 33 to 240 of RecA which was named ''homologous core. The homologous core is likely to be responsible for functions common for all of them, such as the formation of helical nucleoprotein filament that is considered to be involved in homologous pairing in the recombination reaction. The mouse gene is transcribed at a high level in thymus, spleen, testis, and ovary, at lower level in brain and at a further lower level in some other tissues. It is transcribed efficiently in recombination active tissues. A clear functional difference of Rad51 homologues from RecA was suggested by the failure of heterologous genes to complement the deficiency of Scrad51 mutants. This failure seems to reflect the absence of a compatible partner, such as ScRad52 protein in the case of ScRad51 protein, between different species. Thus, these discoveries play a role of the starting point to understand the fundamental gene targeting in mammalian cells and in gene therapy. (J.P.N.)

  20. Structure and Sequence Search on Aptamer-Protein Docking

    Science.gov (United States)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  1. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    Science.gov (United States)

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  2. Correlation of chemical shifts predicted by molecular dynamics simulations for partially disordered proteins

    Energy Technology Data Exchange (ETDEWEB)

    Karp, Jerome M.; Erylimaz, Ertan; Cowburn, David, E-mail: cowburn@cowburnlab.org, E-mail: David.cowburn@einstein.yu.edu [Albert Einstein College of Medicine of Yeshiva University, Department of Biochemistry (United States)

    2015-01-15

    There has been a longstanding interest in being able to accurately predict NMR chemical shifts from structural data. Recent studies have focused on using molecular dynamics (MD) simulation data as input for improved prediction. Here we examine the accuracy of chemical shift prediction for intein systems, which have regions of intrinsic disorder. We find that using MD simulation data as input for chemical shift prediction does not consistently improve prediction accuracy over use of a static X-ray crystal structure. This appears to result from the complex conformational ensemble of the disordered protein segments. We show that using accelerated molecular dynamics (aMD) simulations improves chemical shift prediction, suggesting that methods which better sample the conformational ensemble like aMD are more appropriate tools for use in chemical shift prediction for proteins with disordered regions. Moreover, our study suggests that data accurately reflecting protein dynamics must be used as input for chemical shift prediction in order to correctly predict chemical shifts in systems with disorder.

  3. Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview

    OpenAIRE

    Li, Yaohang

    2013-01-01

    Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a “mini protein folding problem” under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increas...

  4. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method

    DEFF Research Database (Denmark)

    Valentin, Jan B.; Andreetta, Christian; Boomsma, Wouter

    2014-01-01

    We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length s....... The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications. © 2013 Wiley Periodicals, Inc....

  5. False positive reduction in protein-protein interaction predictions using gene ontology annotations

    Directory of Open Access Journals (Sweden)

    Lin Yen-Han

    2007-07-01

    Full Text Available Abstract Background Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. Results Gene Ontology (GO annotations were used to reduce false positive protein-protein interactions (PPI pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The 'strength', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the strength varies between two and ten-fold of randomly removing protein pairs from the datasets. Conclusion Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially

  6. Prediction of human protein function from post-translational modifications and localization features

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Blom, Nikolaj

    2002-01-01

    a number of functional attributes that are more directly related to the linear sequence of amino acids, and hence easier to predict, than protein structure. These attributes include features associated with post-translational modifications and protein sorting, but also much simpler aspects......We have developed an entirely sequence-based method that identifies and integrates relevant features that can be used to assign proteins of unknown function to functional classes, and enzyme categories for enzymes. We show that strategies for the elucidation of protein function may benefit from...

  7. Discrete Haar transform and protein structure.

    Science.gov (United States)

    Morosetti, S

    1997-12-01

    The discrete Haar transform of the sequence of the backbone dihedral angles (phi and psi) was performed over a set of X-ray protein structures of high resolution from the Brookhaven Protein Data Bank. Afterwards, the new dihedral angles were calculated by the inverse transform, using a growing number of Haar functions, from the lower to the higher degree. New structures were obtained using these dihedral angles, with standard values for bond lengths and angles, and with omega = 0 degree. The reconstructed structures were compared with the experimental ones, and analyzed by visual inspection and statistical analysis. When half of the Haar coefficients were used, all the reconstructed structures were not yet collapsed to a tertiary folding, but they showed yet realized most of the secondary motifs. These results indicate a substantial separation of structural information in the space of Haar transform, with the secondary structural information mainly present in the Haar coefficients of lower degrees, and the tertiary one present in the higher degree coefficients. Because of this separation, the representation of the folded structures in the space of Haar transform seems a promising candidate to encompass the problem of premature convergence in genetic algorithms.

  8. Protein subcellular localization prediction using artificial intelligence technology.

    Science.gov (United States)

    Nair, Rajesh; Rost, Burkhard

    2008-01-01

    Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with

  9. The Phyre2 web portal for protein modeling, prediction and analysis.

    Science.gov (United States)

    Kelley, Lawrence A; Mezulis, Stefans; Yates, Christopher M; Wass, Mark N; Sternberg, Michael J E

    2015-06-01

    Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.

  10. Quantitative analysis and prediction of curvature in leucine-rich repeat proteins.

    Science.gov (United States)

    Hindle, K Lauren; Bella, Jordi; Lovell, Simon C

    2009-11-01

    Leucine-rich repeat (LRR) proteins form a large and diverse family. They have a wide range of functions most of which involve the formation of protein-protein interactions. All known LRR structures form curved solenoids, although there is large variation in their curvature. It is this curvature that determines the shape and dimensions of the inner space available for ligand binding. Unfortunately, large-scale parameters such as the overall curvature of a protein domain are extremely difficult to predict. Here, we present a quantitative analysis of determinants of curvature of this family. Individual repeats typically range in length between 20 and 30 residues and have a variety of secondary structures on their convex side. The observed curvature of the LRR domains correlates poorly with the lengths of their individual repeats. We have, therefore, developed a scoring function based on the secondary structure of the convex side of the protein that allows prediction of the overall curvature with a high degree of accuracy. We also demonstrate the effectiveness of this method in selecting a suitable template for comparative modeling. We have developed an automated, quantitative protocol that can be used to predict accurately the curvature of leucine-rich repeat proteins of unknown structure from sequence alone. This protocol is available as an online resource at http://www.bioinf.manchester.ac.uk/curlrr/.

  11. Automated Protein Structure Modeling with SWISS-MODEL Workspace and the Protein Model Portal

    OpenAIRE

    Bordoli, Lorenza; Schwede, Torsten

    2012-01-01

    Comparative protein structure modeling is a computational approach to build three-dimensional structural models for proteins using experimental structures of related protein family members as templates. Regular blind assessments of modeling accuracy have demonstrated that comparative protein structure modeling is currently the most reliable technique to model protein structures. Homology models are often sufficiently accurate to substitute for experimental structures in a wide variety of appl...

  12. Preclinical models used for immunogenicity prediction of therapeutic proteins.

    Science.gov (United States)

    Brinks, Vera; Weinbuch, Daniel; Baker, Matthew; Dean, Yann; Stas, Philippe; Kostense, Stefan; Rup, Bonita; Jiskoot, Wim

    2013-07-01

    All therapeutic proteins are potentially immunogenic. Antibodies formed against these drugs can decrease efficacy, leading to drastically increased therapeutic costs and in rare cases to serious and sometimes life threatening side-effects. Many efforts are therefore undertaken to develop therapeutic proteins with minimal immunogenicity. For this, immunogenicity prediction of candidate drugs during early drug development is essential. Several in silico, in vitro and in vivo models are used to predict immunogenicity of drug leads, to modify potentially immunogenic properties and to continue development of drug candidates with expected low immunogenicity. Despite the extensive use of these predictive models, their actual predictive value varies. Important reasons for this uncertainty are the limited/insufficient knowledge on the immune mechanisms underlying immunogenicity of therapeutic proteins, the fact that different predictive models explore different components of the immune system and the lack of an integrated clinical validation. In this review, we discuss the predictive models in use, summarize aspects of immunogenicity that these models predict and explore the merits and the limitations of each of the models.

  13. Prediction of beta-turns in proteins using the first-order Markov models.

    Science.gov (United States)

    Lin, Thy-Hou; Wang, Ging-Ming; Wang, Yen-Tseng

    2002-01-01

    We present a method based on the first-order Markov models for predicting simple beta-turns and loops containing multiple turns in proteins. Sequences of 338 proteins in a database are divided using the published turn criteria into the following three regions, namely, the turn, the boundary, and the nonturn ones. A transition probability matrix is constructed for either the turn or the nonturn region using the weighted transition probabilities computed for dipeptides identified from each region. There are two such matrices constructed for the boundary region since the transition probabilities for dipeptides immediately preceding or following a turn are different. The window used for scanning a protein sequence from amino (N-) to carboxyl (C-) terminal is a hexapeptide since the transition probability computed for a turn tetrapeptide is capped at both the N- and C- termini with a boundary transition probability indexed respectively from the two boundary transition matrices. A sum of the averaged product of the transition probabilities of all the hexapeptides involving each residue is computed. This is then weighted with a probability computed from assuming that all the hexapeptides are from the nonturn region to give the final prediction quantity. Both simple beta-turns and loops containing multiple turns in a protein are then identified by the rising of the prediction quantity computed. The performance of the prediction scheme or the percentage (%) of correct prediction is evaluated through computation of Matthews correlation coefficients for each protein predicted. It is found that the prediction method is capable of giving prediction results with better correlation between the percent of correct prediction and the Matthews correlation coefficients for a group of test proteins as compared with those predicted using some secondary structural prediction methods. The prediction accuracy for about 40% of proteins in the database or 50% of proteins in the test set is

  14. Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework

    Science.gov (United States)

    2014-01-01

    Motivation Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. Results We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set. PMID:24646119

  15. Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework.

    Science.gov (United States)

    Simha, Ramanuja; Shatkay, Hagit

    2014-03-19

    Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set.

  16. Protein thermostability prediction within homologous families using temperature-dependent statistical potentials.

    Directory of Open Access Journals (Sweden)

    Fabrizio Pucci

    Full Text Available The ability to rationally modify targeted physical and biological features of a protein of interest holds promise in numerous academic and industrial applications and paves the way towards de novo protein design. In particular, bioprocesses that utilize the remarkable properties of enzymes would often benefit from mutants that remain active at temperatures that are either higher or lower than the physiological temperature, while maintaining the biological activity. Many in silico methods have been developed in recent years for predicting the thermodynamic stability of mutant proteins, but very few have focused on thermostability. To bridge this gap, we developed an algorithm for predicting the best descriptor of thermostability, namely the melting temperature Tm, from the protein's sequence and structure. Our method is applicable when the Tm of proteins homologous to the target protein are known. It is based on the design of several temperature-dependent statistical potentials, derived from datasets consisting of either mesostable or thermostable proteins. Linear combinations of these potentials have been shown to yield an estimation of the protein folding free energies at low and high temperatures, and the difference of these energies, a prediction of the melting temperature. This particular construction, that distinguishes between the interactions that contribute more than others to the stability at high temperatures and those that are more stabilizing at low T, gives better performances compared to the standard approach based on T-independent potentials which predict the thermal resistance from the thermodynamic stability. Our method has been tested on 45 proteins of known Tm that belong to 11 homologous families. The standard deviation between experimental and predicted Tm's is equal to 13.6°C in cross validation, and decreases to 8.3°C if the 6 worst predicted proteins are excluded. Possible extensions of our approach are discussed.

  17. Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.

    Science.gov (United States)

    Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe

    2018-02-19

    Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.

  18. Automatic selection of reference taxa for protein-protein interaction prediction with phylogenetic profiling

    DEFF Research Database (Denmark)

    Simonsen, Martin; Maetschke, S.R.; Ragan, M.A.

    2012-01-01

    Motivation: Phylogenetic profiling methods can achieve good accuracy in predicting protein–protein interactions, especially in prokaryotes. Recent studies have shown that the choice of reference taxa (RT) is critical for accurate prediction, but with more than 2500 fully sequenced taxa publicly......: We present three novel methods for automating the selection of RT, using machine learning based on known protein–protein interaction networks. One of these methods in particular, Tree-Based Search, yields greatly improved prediction accuracies. We further show that different methods for constituting...... phylogenetic profiles often require very different RT sets to support high prediction accuracy....

  19. InterProSurf: a web server for predicting interacting sites on protein surfaces

    Science.gov (United States)

    Negi, Surendra S.; Schein, Catherine H.; Oezguen, Numan; Power, Trevor D.; Braun, Werner

    2009-01-01

    Summary A new web server, InterProSurf, predicts interacting amino acid residues in proteins that are most likely to interact with other proteins, given the 3D structures of subunits of a protein complex. The prediction method is based on solvent accessible surface area of residues in the isolated subunits, a propensity scale for interface residues and a clustering algorithm to identify surface regions with residues of high interface propensities. Here we illustrate the application of InterProSurf to determine which areas of Bacillus anthracis toxins and measles virus hemagglutinin protein interact with their respective cell surface receptors. The computationally predicted regions overlap with those regions previously identified as interface regions by sequence analysis and mutagenesis experiments. PMID:17933856

  20. System and methods for predicting transmembrane domains in membrane proteins and mining the genome for recognizing G-protein coupled receptors

    Science.gov (United States)

    Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely

    2013-02-05

    The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.

  1. Evolutionary Structure Prediction of Stoichiometric Compounds

    Science.gov (United States)

    Zhu, Qiang; Oganov, Artem

    2014-03-01

    In general, for a given ionic compound AmBn\\ at ambient pressure condition, its stoichiometry reflects the valence state ratio between per chemical specie (i.e., the charges for each anion and cation). However, compounds under high pressure exhibit significantly behavior, compared to those analogs at ambient condition. Here we developed a method to solve the crystal structure prediction problem based on the evolutionary algorithms, which can predict both the stable compounds and their crystal structures at arbitrary P,T-conditions, given just the set of chemical elements. By applying this method to a wide range of binary ionic systems (Na-Cl, Mg-O, Xe-O, Cs-F, etc), we discovered a lot of compounds with brand new stoichimetries which can become thermodynamically stable. Further electronic structure analysis on these novel compounds indicates that several factors can contribute to this extraordinary phenomenon: (1) polyatomic anions; (2) free electron localization; (3) emergence of new valence states; (4) metallization. In particular, part of the results have been confirmed by experiment, which warrants that this approach can play a crucial role in new materials design under extreme pressure conditions. This work is funded by DARPA (Grants No. W31P4Q1210008 and W31P4Q1310005), NSF (EAR-1114313 and DMR-1231586).

  2. Predicting Protein Function via Semantic Integration of Multiple Networks.

    Science.gov (United States)

    Yu, Guoxian; Fu, Guangyuan; Wang, Jun; Zhu, Hailong

    2016-01-01

    Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically integrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet.

  3. Predicting highly-connected hubs in protein interaction networks by QSAR and biological data descriptors

    Science.gov (United States)

    Hsing, Michael; Byler, Kendall; Cherkasov, Artem

    2009-01-01

    Hub proteins (those engaged in most physical interactions in a protein interaction network (PIN) have recently gained much research interest due to their essential role in mediating cellular processes and their potential therapeutic value. It is straightforward to identify hubs if the underlying PIN is experimentally determined; however, theoretical hub prediction remains a very challenging task, as physicochemical properties that differentiate hubs from less connected proteins remain mostly uncharacterized. To adequately distinguish hubs from non-hub proteins we have utilized over 1300 protein descriptors, some of which represent QSAR (quantitative structure-activity relationship) parameters, and some reflect sequence-derived characteristics of proteins including domain composition and functional annotations. Those protein descriptors, together with available protein interaction data have been processed by a machine learning method (boosting trees) and resulted in the development of hub classifiers that are capable of predicting highly interacting proteins for four model organisms: Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster and Homo sapiens. More importantly, through the analyses of the most relevant protein descriptors, we are able to demonstrate that hub proteins not only share certain common physicochemical and structural characteristics that make them different from non-hub counterparts, but they also exhibit species-specific characteristics that should be taken into account when analyzing different PINs. The developed prediction models can be used for determining highly interacting proteins in the four studied species to assist future proteomics experiments and PIN analyses. Availability The source code and executable program of the hub classifier are available for download at: http://www.cnbi2.ca/hub-analysis/ PMID:20198194

  4. Analysis of deep learning methods for blind protein contact prediction in CASP12.

    Science.gov (United States)

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2018-03-01

    Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.

  5. PCNA Structure and Interactions with Partner Proteins

    KAUST Repository

    Oke, Muse; Zaher, Manal S.; Hamdan, Samir

    2018-01-01

    Proliferating cell nuclear antigen (PCNA) consists of three identical monomers that topologically encircle double-stranded DNA. PCNA stimulates the processivity of DNA polymerase δ and, to a less extent, the intrinsically highly processive DNA polymerase ε. It also functions as a platform that recruits and coordinates the activities of a large number of DNA processing proteins. Emerging structural and biochemical studies suggest that the nature of PCNA-partner proteins interactions is complex. A hydrophobic groove at the front side of PCNA serves as a primary docking site for the consensus PIP box motifs present in many PCNA-binding partners. Sequences that immediately flank the PIP box motif or regions that are distant from it could also interact with the hydrophobic groove and other regions of PCNA. Posttranslational modifications on the backside of PCNA could add another dimension to its interaction with partner proteins. An encounter of PCNA with different DNA structures might also be involved in coordinating its interactions. Finally, the ability of PCNA to bind up to three proteins while topologically linked to DNA suggests that it would be a versatile toolbox in many different DNA processing reactions.

  6. PCNA Structure and Interactions with Partner Proteins

    KAUST Repository

    Oke, Muse

    2018-01-29

    Proliferating cell nuclear antigen (PCNA) consists of three identical monomers that topologically encircle double-stranded DNA. PCNA stimulates the processivity of DNA polymerase δ and, to a less extent, the intrinsically highly processive DNA polymerase ε. It also functions as a platform that recruits and coordinates the activities of a large number of DNA processing proteins. Emerging structural and biochemical studies suggest that the nature of PCNA-partner proteins interactions is complex. A hydrophobic groove at the front side of PCNA serves as a primary docking site for the consensus PIP box motifs present in many PCNA-binding partners. Sequences that immediately flank the PIP box motif or regions that are distant from it could also interact with the hydrophobic groove and other regions of PCNA. Posttranslational modifications on the backside of PCNA could add another dimension to its interaction with partner proteins. An encounter of PCNA with different DNA structures might also be involved in coordinating its interactions. Finally, the ability of PCNA to bind up to three proteins while topologically linked to DNA suggests that it would be a versatile toolbox in many different DNA processing reactions.

  7. Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity.

    Science.gov (United States)

    Camproux, A C; Tufféry, P

    2005-08-05

    Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.

  8. Protein-mediated surface structuring in biomembranes

    Directory of Open Access Journals (Sweden)

    Maggio B.

    2005-01-01

    Full Text Available The lipids and proteins of biomembranes exhibit highly dissimilar conformations, geometrical shapes, amphipathicity, and thermodynamic properties which constrain their two-dimensional molecular packing, electrostatics, and interaction preferences. This causes inevitable development of large local tensions that frequently relax into phase or compositional immiscibility along lateral and transverse planes of the membrane. On the other hand, these effects constitute the very codes that mediate molecular and structural changes determining and controlling the possibilities for enzymatic activity, apposition and recombination in biomembranes. The presence of proteins constitutes a major perturbing factor for the membrane sculpturing both in terms of its surface topography and dynamics. We will focus on some results from our group within this context and summarize some recent evidence for the active involvement of extrinsic (myelin basic protein, integral (Folch-Lees proteolipid protein and amphitropic (c-Fos and c-Jun proteins, as well as a membrane-active amphitropic phosphohydrolytic enzyme (neutral sphingomyelinase, in the process of lateral segregation and dynamics of phase domains, sculpturing of the surface topography, and the bi-directional modulation of the membrane biochemical reactivity.

  9. Prediction of heterodimer