WorldWideScience

Sample records for protein structure model

  1. Automated protein structure modeling with SWISS-MODEL Workspace and the Protein Model Portal.

    Science.gov (United States)

    Bordoli, Lorenza; Schwede, Torsten

    2012-01-01

    Comparative protein structure modeling is a computational approach to build three-dimensional structural models for proteins using experimental structures of related protein family members as templates. Regular blind assessments of modeling accuracy have demonstrated that comparative protein structure modeling is currently the most reliable technique to model protein structures. Homology models are often sufficiently accurate to substitute for experimental structures in a wide variety of applications. Since the usefulness of a model for specific application is determined by its accuracy, model quality estimation is an essential component of protein structure prediction. Comparative protein modeling has become a routine approach in many areas of life science research since fully automated modeling systems allow also nonexperts to build reliable models. In this chapter, we describe practical approaches for automated protein structure modeling with SWISS-MODEL Workspace and the Protein Model Portal.

  2. Automated Protein Structure Modeling with SWISS-MODEL Workspace and the Protein Model Portal

    OpenAIRE

    Bordoli, Lorenza; Schwede, Torsten

    2012-01-01

    Comparative protein structure modeling is a computational approach to build three-dimensional structural models for proteins using experimental structures of related protein family members as templates. Regular blind assessments of modeling accuracy have demonstrated that comparative protein structure modeling is currently the most reliable technique to model protein structures. Homology models are often sufficiently accurate to substitute for experimental structures in a wide variety of appl...

  3. The Protein Model Portal--a comprehensive resource for protein structure and model information.

    Science.gov (United States)

    Haas, Juergen; Roth, Steven; Arnold, Konstantin; Kiefer, Florian; Schmidt, Tobias; Bordoli, Lorenza; Schwede, Torsten

    2013-01-01

    The Protein Model Portal (PMP) has been developed to foster effective use of 3D molecular models in biomedical research by providing convenient and comprehensive access to structural information for proteins. Both experimental structures and theoretical models for a given protein can be searched simultaneously and analyzed for structural variability. By providing a comprehensive view on structural information, PMP offers the opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for proteins. PMP is an open project so that new methods developed by the community can contribute to PMP, for example, new modeling servers for creating homology models and model quality estimation servers for model validation. The accuracy of participating modeling servers is continuously evaluated by the Continuous Automated Model EvaluatiOn (CAMEO) project. The PMP offers a unique interface to visualize structural coverage of a protein combining both theoretical models and experimental structures, allowing straightforward assessment of the model quality and hence their utility. The portal is updated regularly and actively developed to include latest methods in the field of computational structural biology. Database URL: http://www.proteinmodelportal.org.

  4. The Protein Model Portal—a comprehensive resource for protein structure and model information

    Science.gov (United States)

    Haas, Juergen; Roth, Steven; Arnold, Konstantin; Kiefer, Florian; Schmidt, Tobias; Bordoli, Lorenza; Schwede, Torsten

    2013-01-01

    The Protein Model Portal (PMP) has been developed to foster effective use of 3D molecular models in biomedical research by providing convenient and comprehensive access to structural information for proteins. Both experimental structures and theoretical models for a given protein can be searched simultaneously and analyzed for structural variability. By providing a comprehensive view on structural information, PMP offers the opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for proteins. PMP is an open project so that new methods developed by the community can contribute to PMP, for example, new modeling servers for creating homology models and model quality estimation servers for model validation. The accuracy of participating modeling servers is continuously evaluated by the Continuous Automated Model EvaluatiOn (CAMEO) project. The PMP offers a unique interface to visualize structural coverage of a protein combining both theoretical models and experimental structures, allowing straightforward assessment of the model quality and hence their utility. The portal is updated regularly and actively developed to include latest methods in the field of computational structural biology. Database URL: http://www.proteinmodelportal.org PMID:23624946

  5. Quality assessment of protein model-structures based on structural and functional similarities.

    Science.gov (United States)

    Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata

    2012-09-21

    Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and

  6. Modeling protein structures: construction and their applications.

    Science.gov (United States)

    Ring, C S; Cohen, F E

    1993-06-01

    Although no general solution to the protein folding problem exists, the three-dimensional structures of proteins are being successfully predicted when experimentally derived constraints are used in conjunction with heuristic methods. In the case of interleukin-4, mutagenesis data and CD spectroscopy were instrumental in the accurate assignment of secondary structure. In addition, the tertiary structure was highly constrained by six cysteines separated by many residues that formed three disulfide bridges. Although the correct structure was a member of a short list of plausible structures, the "best" structure was the topological enantiomer of the experimentally determined conformation. For many proteases, other experimentally derived structures can be used as templates to identify the secondary structure elements. In a procedure called modeling by homology, the structure of a known protein is used as a scaffold to predict the structure of another related protein. This method has been used to model a serine and a cysteine protease that are important in the schistosome and malarial life cycles, respectively. The model structures were then used to identify putative small molecule enzyme inhibitors computationally. Experiments confirm that some of these nonpeptidic compounds are active at concentrations of less than 10 microM.

  7. Fast loop modeling for protein structures

    Science.gov (United States)

    Zhang, Jiong; Nguyen, Son; Shang, Yi; Xu, Dong; Kosztin, Ioan

    2015-03-01

    X-ray crystallography is the main method for determining 3D protein structures. In many cases, however, flexible loop regions of proteins cannot be resolved by this approach. This leads to incomplete structures in the protein data bank, preventing further computational study and analysis of these proteins. For instance, all-atom molecular dynamics (MD) simulation studies of structure-function relationship require complete protein structures. To address this shortcoming, we have developed and implemented an efficient computational method for building missing protein loops. The method is database driven and uses deep learning and multi-dimensional scaling algorithms. We have implemented the method as a simple stand-alone program, which can also be used as a plugin in existing molecular modeling software, e.g., VMD. The quality and stability of the generated structures are assessed and tested via energy scoring functions and by equilibrium MD simulations. The proposed method can also be used in template-based protein structure prediction. Work supported by the National Institutes of Health [R01 GM100701]. Computer time was provided by the University of Missouri Bioinformatics Consortium.

  8. Accurate protein structure modeling using sparse NMR data and homologous structure information.

    Science.gov (United States)

    Thompson, James M; Sgourakis, Nikolaos G; Liu, Gaohua; Rossi, Paolo; Tang, Yuefeng; Mills, Jeffrey L; Szyperski, Thomas; Montelione, Gaetano T; Baker, David

    2012-06-19

    While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining (1)H(N), (13)C, and (15)N backbone and (13)Cβ chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 Å backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2-1.9 Å relative to the conventional determined NMR ensembles and of 0.9-1.6 Å relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments.

  9. A hidden markov model derived structural alphabet for proteins.

    Science.gov (United States)

    Camproux, A C; Gautier, R; Tufféry, P

    2004-06-04

    Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.

  10. Predicting nucleic acid binding interfaces from structural models of proteins.

    Science.gov (United States)

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  11. A generative, probabilistic model of local protein structure

    DEFF Research Database (Denmark)

    Boomsma, Wouter; Mardia, Kanti V.; Taylor, Charles C.

    2008-01-01

    Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative...... conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state...

  12. Binding free energy analysis of protein-protein docking model structures by evERdock.

    Science.gov (United States)

    Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio

    2018-03-14

    To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.

  13. Models of protein-ligand crystal structures: trust, but verify.

    Science.gov (United States)

    Deller, Marc C; Rupp, Bernhard

    2015-09-01

    X-ray crystallography provides the most accurate models of protein-ligand structures. These models serve as the foundation of many computational methods including structure prediction, molecular modelling, and structure-based drug design. The success of these computational methods ultimately depends on the quality of the underlying protein-ligand models. X-ray crystallography offers the unparalleled advantage of a clear mathematical formalism relating the experimental data to the protein-ligand model. In the case of X-ray crystallography, the primary experimental evidence is the electron density of the molecules forming the crystal. The first step in the generation of an accurate and precise crystallographic model is the interpretation of the electron density of the crystal, typically carried out by construction of an atomic model. The atomic model must then be validated for fit to the experimental electron density and also for agreement with prior expectations of stereochemistry. Stringent validation of protein-ligand models has become possible as a result of the mandatory deposition of primary diffraction data, and many computational tools are now available to aid in the validation process. Validation of protein-ligand complexes has revealed some instances of overenthusiastic interpretation of ligand density. Fundamental concepts and metrics of protein-ligand quality validation are discussed and we highlight software tools to assist in this process. It is essential that end users select high quality protein-ligand models for their computational and biological studies, and we provide an overview of how this can be achieved.

  14. CONFOLD2: improved contact-driven ab initio protein structure modeling.

    Science.gov (United States)

    Adhikari, Badri; Cheng, Jianlin

    2018-01-25

    Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/ .

  15. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

    KAUST Repository

    Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z.; Gao, Xin

    2017-01-01

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  16. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

    KAUST Repository

    Najibi, Seyed Morteza

    2017-02-08

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  17. A resource for benchmarking the usefulness of protein structure models.

    Science.gov (United States)

    Carbajo, Daniel; Tramontano, Anna

    2012-08-02

    Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by non-academics: No.

  18. A resource for benchmarking the usefulness of protein structure models

    Directory of Open Access Journals (Sweden)

    Carbajo Daniel

    2012-08-01

    Full Text Available Abstract Background Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. Results This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. Conclusions The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. Implementation, availability and requirements Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php. Operating system(s: Platform independent. Programming language: Perl-BioPerl (program; mySQL, Perl DBI and DBD modules (database; php, JavaScript, Jmol scripting (web server. Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet and PSAIA. License: Free. Any

  19. A resource for benchmarking the usefulness of protein structure models.

    KAUST Repository

    Carbajo, Daniel

    2012-08-02

    BACKGROUND: Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. RESULTS: This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. CONCLUSIONS: The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by

  20. A resource for benchmarking the usefulness of protein structure models.

    KAUST Repository

    Carbajo, Daniel; Tramontano, Anna

    2012-01-01

    BACKGROUND: Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. RESULTS: This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. CONCLUSIONS: The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by

  1. Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview

    OpenAIRE

    Li, Yaohang

    2013-01-01

    Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a “mini protein folding problem” under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increas...

  2. Identify High-Quality Protein Structural Models by Enhanced K-Means.

    Science.gov (United States)

    Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang

    2017-01-01

    Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.

  3. Conformational sampling in template-free protein loop structure modeling: an overview.

    Science.gov (United States)

    Li, Yaohang

    2013-01-01

    Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a "mini protein folding problem" under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increasing number of known structures available in PDB. This mini review provides an overview on the recent computational approaches for loop structure modeling. In particular, we focus on the approaches of sampling loop conformation space, which is a critical step to obtain high resolution models in template-free methods. We review the potential energy functions for loop modeling, loop buildup mechanisms to satisfy geometric constraints, and loop conformation sampling algorithms. The recent loop modeling results are also summarized.

  4. CONFORMATIONAL SAMPLING IN TEMPLATE-FREE PROTEIN LOOP STRUCTURE MODELING: AN OVERVIEW

    Directory of Open Access Journals (Sweden)

    Yaohang Li

    2013-02-01

    Full Text Available Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a “mini protein folding problem” under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increasing number of known structures available in PDB. This mini review provides an overview on the recent computational approaches for loop structure modeling. In particular, we focus on the approaches of sampling loop conformation space, which is a critical step to obtain high resolution models in template-free methods. We review the potential energy functions for loop modeling, loop buildup mechanisms to satisfy geometric constraints, and loop conformation sampling algorithms. The recent loop modeling results are also summarized.

  5. Compare local pocket and global protein structure models by small structure patterns

    KAUST Repository

    Cui, Xuefeng

    2015-09-09

    Researchers proposed several criteria to assess the quality of predicted protein structures because it is one of the essential tasks in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competitions. Popular criteria include root mean squared deviation (RMSD), MaxSub score, TM-score, GDT-TS and GDT-HA scores. All these criteria require calculation of rigid transformations to superimpose the the predicted protein structure to the native protein structure. Yet, how to obtain the rigid transformations is unknown or with high time complexity, and, hence, heuristic algorithms were proposed. In this work, we carefully design various small structure patterns, including the ones specifically tuned for local pockets. Such structure patterns are biologically meaningful, and address the issue of relying on a sufficient number of backbone residue fragments for existing methods. We sample the rigid transformations from these small structure patterns; and the optimal superpositions yield by these small structures are refined and reported. As a result, among 11; 669 pairs of predicted and native local protein pocket models from the CASP10 dataset, the GDT-TS scores calculated by our method are significantly higher than those calculated by LGA. Moreover, our program is computationally much more efficient. Source codes and executables are publicly available at http://www.cbrc.kaust.edu.sa/prosta/

  6. Modeling complexes of modeled proteins.

    Science.gov (United States)

    Anishchenko, Ivan; Kundrotas, Petras J; Vakser, Ilya A

    2017-03-01

    Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å C α RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  7. Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity.

    Science.gov (United States)

    Camproux, A C; Tufféry, P

    2005-08-05

    Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.

  8. Modeling structure of G protein-coupled receptors in huan genome

    KAUST Repository

    Zhang, Yang

    2016-01-26

    G protein-coupled receptors (or GPCRs) are integral transmembrane proteins responsible to various cellular signal transductions. Human GPCR proteins are encoded by 5% of human genes but account for the targets of 40% of the FDA approved drugs. Due to difficulties in crystallization, experimental structure determination remains extremely difficult for human GPCRs, which have been a major barrier in modern structure-based drug discovery. We proposed a new hybrid protocol, GPCR-I-TASSER, to construct GPCR structure models by integrating experimental mutagenesis data with ab initio transmembrane-helix assembly simulations, assisted by the predicted transmembrane-helix interaction networks. The method was tested in recent community-wide GPCRDock experiments and constructed models with a root mean square deviation 1.26 Å for Dopamine-3 and 2.08 Å for Chemokine-4 receptors in the transmembrane domain regions, which were significantly closer to the native than the best templates available in the PDB. GPCR-I-TASSER has been applied to model all 1,026 putative GPCRs in the human genome, where 923 are found to have correct folds based on the confidence score analysis and mutagenesis data comparison. The successfully modeled GPCRs contain many pharmaceutically important families that do not have previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin and Neuropeptide Y receptors. All the human GPCR models have been made publicly available through the GPCR-HGmod database at http://zhanglab.ccmb.med.umich.edu/GPCR-HGmod/ The results demonstrate new progress on genome-wide structure modeling of transmembrane proteins which should bring useful impact on the effort of GPCR-targeted drug discovery.

  9. Protein structure analysis using the resonant recognition model and wavelet transforms

    International Nuclear Information System (INIS)

    Fang, Q.; Cosic, I.

    1998-01-01

    An approach based on the resonant recognition model and the discrete wavelet transform is introduced here for characterising proteins' biological function. The protein sequence is converted into a numerical series by assigning the electron-ion interaction potential to each amino acid from N-terminal to C-terminal. A set of peaks is found after performing a wavelet transform onto a numerical series representing a group of homologous proteins. These peaks are related to protein structural and functional properties and named characteristic vector of that protein group. Further more, the amino acids contributing mostly to a protein's biological functions, the so-called 'hot spots' amino acids, are predicted by the continuous wavelet transform. It is found that the hot spots are clustered around the protein's cleft structure. The wavelets approach provides a novel methods for amino acid sequence analysis as well as an expansion for the newly established macromolecular interaction model: the resonant recognition model. Copyright (1998) Australasian Physical and Engineering Sciences in Medicine

  10. Building alternate protein structures using the elastic network model.

    Science.gov (United States)

    Yang, Qingyi; Sharp, Kim A

    2009-02-15

    We describe a method for efficiently generating ensembles of alternate, all-atom protein structures that (a) differ significantly from the starting structure, (b) have good stereochemistry (bonded geometry), and (c) have good steric properties (absence of atomic overlap). The method uses reconstruction from a series of backbone framework structures that are obtained from a modified elastic network model (ENM) by perturbation along low-frequency normal modes. To ensure good quality backbone frameworks, the single force parameter ENM is modified by introducing two more force parameters to characterize the interaction between the consecutive carbon alphas and those within the same secondary structure domain. The relative stiffness of the three parameters is parameterized to reproduce B-factors, while maintaining good bonded geometry. After parameterization, violations of experimental Calpha-Calpha distances and Calpha-Calpha-Calpha pseudo angles along the backbone are reduced to less than 1%. Simultaneously, the average B-factor correlation coefficient improves to R = 0.77. Two applications illustrate the potential of the approach. (1) 102,051 protein backbones spanning a conformational space of 15 A root mean square deviation were generated from 148 nonredundant proteins in the PDB database, and all-atom models with minimal bonded and nonbonded violations were produced from this ensemble of backbone structures using the SCWRL side chain building program. (2) Improved backbone templates for homology modeling. Fifteen query sequences were each modeled on two targets. For each of the 30 target frameworks, dozens of improved templates could be produced In all cases, improved full atom homology models resulted, of which 50% could be identified blind using the D-Fire statistical potential. (c) 2008 Wiley-Liss, Inc.

  11. Predicting Protein Secondary Structure with Markov Models

    DEFF Research Database (Denmark)

    Fischer, Paul; Larsen, Simon; Thomsen, Claus

    2004-01-01

    we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....

  12. Mixing Energy Models in Genetic Algorithms for On-Lattice Protein Structure Prediction

    Directory of Open Access Journals (Sweden)

    Mahmood A. Rashid

    2013-01-01

    Full Text Available Protein structure prediction (PSP is computationally a very challenging problem. The challenge largely comes from the fact that the energy function that needs to be minimised in order to obtain the native structure of a given protein is not clearly known. A high resolution 20×20 energy model could better capture the behaviour of the actual energy function than a low resolution energy model such as hydrophobic polar. However, the fine grained details of the high resolution interaction energy matrix are often not very informative for guiding the search. In contrast, a low resolution energy model could effectively bias the search towards certain promising directions. In this paper, we develop a genetic algorithm that mainly uses a high resolution energy model for protein structure evaluation but uses a low resolution HP energy model in focussing the search towards exploring structures that have hydrophobic cores. We experimentally show that this mixing of energy models leads to significant lower energy structures compared to the state-of-the-art results.

  13. The Protein Model Portal.

    Science.gov (United States)

    Arnold, Konstantin; Kiefer, Florian; Kopp, Jürgen; Battey, James N D; Podvinec, Michael; Westbrook, John D; Berman, Helen M; Bordoli, Lorenza; Schwede, Torsten

    2009-03-01

    Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase.

  14. Insulin as a model to teach three-dimensional structure of proteins

    Directory of Open Access Journals (Sweden)

    João Batista Teixeira da Rocha

    2018-02-01

    Proteins are the most ubiquitous macromolecules found in the living cells and have innumerous physiological functions. Therefore, it is fundamental to build a solid knowledge about the proteins three dimensional structure to better understand the living state. The hierarchical structure of proteins is usually studied in the undergraduate discipline of Biochemistry. Here we described pedagogical interventions designed to increase the preservice teacher chemistry students’ knowledge about protein structure. The activities were made using alternative and cheap materials to encourage the application of these simple methodologies by the future teachers in the secondary school. From the primary structure of insulin chains, students had to construct a three-dimensional structure of insulin. After the activities, the students highlighted an improvement of their previous knowledge about proteins structure. The construction of a tridimensional model together with other activities seems to be an efficient way to promote the learning about the structure of proteins to undergraduate students. The methodology used was inexpensiveness and simple and it can be used both in the university and in the high-school.

  15. Mass Spectrometry Coupled Experiments and Protein Structure Modeling Methods

    Directory of Open Access Journals (Sweden)

    Lee Sael

    2013-10-01

    Full Text Available With the accumulation of next generation sequencing data, there is increasing interest in the study of intra-species difference in molecular biology, especially in relation to disease analysis. Furthermore, the dynamics of the protein is being identified as a critical factor in its function. Although accuracy of protein structure prediction methods is high, provided there are structural templates, most methods are still insensitive to amino-acid differences at critical points that may change the overall structure. Also, predicted structures are inherently static and do not provide information about structural change over time. It is challenging to address the sensitivity and the dynamics by computational structure predictions alone. However, with the fast development of diverse mass spectrometry coupled experiments, low-resolution but fast and sensitive structural information can be obtained. This information can then be integrated into the structure prediction process to further improve the sensitivity and address the dynamics of the protein structures. For this purpose, this article focuses on reviewing two aspects: the types of mass spectrometry coupled experiments and structural data that are obtainable through those experiments; and the structure prediction methods that can utilize these data as constraints. Also, short review of current efforts in integrating experimental data in the structural modeling is provided.

  16. The Protein Model Portal

    OpenAIRE

    Arnold, Konstantin; Kiefer, Florian; Kopp, J?rgen; Battey, James N. D.; Podvinec, Michael; Westbrook, John D.; Berman, Helen M.; Bordoli, Lorenza; Schwede, Torsten

    2008-01-01

    Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploratio...

  17. Protein Structure Prediction by Protein Threading

    Science.gov (United States)

    Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

    The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

  18. Electronic transport on the spatial structure of the protein: Three-dimensional lattice model

    International Nuclear Information System (INIS)

    Sarmento, R.G.; Frazão, N.F.; Macedo-Filho, A.

    2017-01-01

    Highlights: • The electronic transport on the structure of the three-dimensional lattice model of the protein is studied. • The signing of the current–voltage is directly affected by permutations of the weak bonds in the structure. • Semiconductor behave of the proteins suggest a potential application in the development of novel biosensors. - Abstract: We report a numerical analysis of the electronic transport in protein chain consisting of thirty-six standard amino acids. The protein chains studied have three-dimensional structure, which can present itself in three distinct conformations and the difference consist in the presence or absence of thirteen hydrogen-bondings. Our theoretical method uses an electronic tight-binding Hamiltonian model, appropriate to describe the protein segments modeled by the amino acid chain. We note that the presence and the permutations between weak bonds in the structure of proteins are directly related to the signing of the current–voltage. Furthermore, the electronic transport depends on the effect of temperature. In addition, we have found a semiconductor behave in the models investigated and it suggest a potential application in the development of novel biosensors for molecular diagnostics.

  19. Electronic transport on the spatial structure of the protein: Three-dimensional lattice model

    Energy Technology Data Exchange (ETDEWEB)

    Sarmento, R.G. [Departamento de Ciências Biológicas, Universidade Federal do Piauí, 64800-000 Floriano, PI (Brazil); Frazão, N.F. [Centro de Educação e Saúde, Universidade Federal de Campina Grande, 581750-000 Cuité, PB (Brazil); Macedo-Filho, A., E-mail: amfilho@gmail.com [Campus Prof. Antonio Geovanne Alves de Sousa, Universidade Estadual do Piauí, 64260-000 Piripiri, PI (Brazil)

    2017-01-30

    Highlights: • The electronic transport on the structure of the three-dimensional lattice model of the protein is studied. • The signing of the current–voltage is directly affected by permutations of the weak bonds in the structure. • Semiconductor behave of the proteins suggest a potential application in the development of novel biosensors. - Abstract: We report a numerical analysis of the electronic transport in protein chain consisting of thirty-six standard amino acids. The protein chains studied have three-dimensional structure, which can present itself in three distinct conformations and the difference consist in the presence or absence of thirteen hydrogen-bondings. Our theoretical method uses an electronic tight-binding Hamiltonian model, appropriate to describe the protein segments modeled by the amino acid chain. We note that the presence and the permutations between weak bonds in the structure of proteins are directly related to the signing of the current–voltage. Furthermore, the electronic transport depends on the effect of temperature. In addition, we have found a semiconductor behave in the models investigated and it suggest a potential application in the development of novel biosensors for molecular diagnostics.

  20. Protein structure modelling and evaluation based on a 4-distance description of side-chain interactions

    Directory of Open Access Journals (Sweden)

    Inbar Yuval

    2010-07-01

    Full Text Available Abstract Background Accurate evaluation and modelling of residue-residue interactions within and between proteins is a key aspect of computational structure prediction including homology modelling, protein-protein docking, refinement of low-resolution structures, and computational protein design. Results Here we introduce a method for accurate protein structure modelling and evaluation based on a novel 4-distance description of residue-residue interaction geometry. Statistical 4-distance preferences were extracted from high-resolution protein structures and were used as a basis for a knowledge-based potential, called Hunter. We demonstrate that 4-distance description of side chain interactions can be used reliably to discriminate the native structure from a set of decoys. Hunter ranked the native structure as the top one in 217 out of 220 high-resolution decoy sets, in 25 out of 28 "Decoys 'R' Us" decoy sets and in 24 out of 27 high-resolution CASP7/8 decoy sets. The same concept was applied to side chain modelling in protein structures. On a set of very high-resolution protein structures the average RMSD was 1.47 Å for all residues and 0.73 Å for buried residues, which is in the range of attainable accuracy for a model. Finally, we show that Hunter performs as good or better than other top methods in homology modelling based on results from the CASP7 experiment. The supporting web site http://bioinfo.weizmann.ac.il/hunter/ was developed to enable the use of Hunter and for visualization and interactive exploration of 4-distance distributions. Conclusions Our results suggest that Hunter can be used as a tool for evaluation and for accurate modelling of residue-residue interactions in protein structures. The same methodology is applicable to other areas involving high-resolution modelling of biomolecules.

  1. Protein loop modeling using a new hybrid energy function and its application to modeling in inaccurate structural environments.

    Directory of Open Access Journals (Sweden)

    Hahnbeom Park

    Full Text Available Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.

  2. Mapping monomeric threading to protein-protein structure prediction.

    Science.gov (United States)

    Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang

    2013-03-25

    The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.

  3. Validation-driven protein-structure improvement

    NARCIS (Netherlands)

    Touw, W.G.

    2016-01-01

    High-quality protein structure models are essential for many Life Science applications, such as protein engineering, molecular dynamics, drug design, and homology modelling. The WHAT_CHECK model validation project and the PDB_REDO model optimisation project have shown that many structure models in

  4. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method.

    Science.gov (United States)

    Valentin, Jan B; Andreetta, Christian; Boomsma, Wouter; Bottaro, Sandro; Ferkinghoff-Borg, Jesper; Frellsen, Jes; Mardia, Kanti V; Tian, Pengfei; Hamelryck, Thomas

    2014-02-01

    We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain and side chains, respectively. Conceptually, this constitutes a probabilistic and continuous alternative to the use of discrete fragment and rotamer libraries. The local model is combined with a nonlocal model that involves a small number of energy terms according to a physical force field, and some information on the overall secondary structure content. In this initial study we focus on the formulation of the joint model and the evaluation of the use of an energy vector as a descriptor of a protein's nonlocal structure; hence, we derive the parameters of the nonlocal model from the native structure without loss of generality. The local and nonlocal models are combined using the reference ratio method, which is a well-justified probabilistic construction. For evaluation, we use the resulting joint models to predict the structure of four proteins. The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications. Copyright © 2013 Wiley Periodicals, Inc.

  5. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method

    DEFF Research Database (Denmark)

    Valentin, Jan B.; Andreetta, Christian; Boomsma, Wouter

    2014-01-01

    We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length s....... The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications. © 2013 Wiley Periodicals, Inc....

  6. Structural characterisation of medically relevant protein assemblies by integrating mass spectrometry with computational modelling.

    Science.gov (United States)

    Politis, Argyris; Schmidt, Carla

    2018-03-20

    Structural mass spectrometry with its various techniques is a powerful tool for the structural elucidation of medically relevant protein assemblies. It delivers information on the composition, stoichiometries, interactions and topologies of these assemblies. Most importantly it can deal with heterogeneous mixtures and assemblies which makes it universal among the conventional structural techniques. In this review we summarise recent advances and challenges in structural mass spectrometric techniques. We describe how the combination of the different mass spectrometry-based methods with computational strategies enable structural models at molecular levels of resolution. These models hold significant potential for helping us in characterizing the function of protein assemblies related to human health and disease. In this review we summarise the techniques of structural mass spectrometry often applied when studying protein-ligand complexes. We exemplify these techniques through recent examples from literature that helped in the understanding of medically relevant protein assemblies. We further provide a detailed introduction into various computational approaches that can be integrated with these mass spectrometric techniques. Last but not least we discuss case studies that integrated mass spectrometry and computational modelling approaches and yielded models of medically important protein assembly states such as fibrils and amyloids. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.

  7. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models

    Directory of Open Access Journals (Sweden)

    Borodovsky Mark

    2006-03-01

    Full Text Available Abstract Background The accuracy of protein secondary structure prediction has been improving steadily towards the 88% estimated theoretical limit. There are two types of prediction algorithms: Single-sequence prediction algorithms imply that information about other (homologous proteins is not available, while algorithms of the second type imply that information about homologous proteins is available, and use it intensively. The single-sequence algorithms could make an important contribution to studies of proteins with no detected homologs, however the accuracy of protein secondary structure prediction from a single-sequence is not as high as when the additional evolutionary information is present. Results In this paper, we further refine and extend the hidden semi-Markov model (HSMM initially considered in the BSPSS algorithm. We introduce an improved residue dependency model by considering the patterns of statistically significant amino acid correlation at structural segment borders. We also derive models that specialize on different sections of the dependency structure and incorporate them into HSMM. In addition, we implement an iterative training method to refine estimates of HSMM parameters. The three-state-per-residue accuracy and other accuracy measures of the new method, IPSSP, are shown to be comparable or better than ones for BSPSS as well as for PSIPRED, tested under the single-sequence condition. Conclusions We have shown that new dependency models and training methods bring further improvements to single-sequence protein secondary structure prediction. The results are obtained under cross-validation conditions using a dataset with no pair of sequences having significant sequence similarity. As new sequences are added to the database it is possible to augment the dependency structure and obtain even higher accuracy. Current and future advances should contribute to the improvement of function prediction for orphan proteins inscrutable

  8. Classification of proteins: available structural space for molecular modeling.

    Science.gov (United States)

    Andreeva, Antonina

    2012-01-01

    The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.

  9. The interface of protein structure, protein biophysics, and molecular evolution

    Science.gov (United States)

    Liberles, David A; Teichmann, Sarah A; Bahar, Ivet; Bastolla, Ugo; Bloom, Jesse; Bornberg-Bauer, Erich; Colwell, Lucy J; de Koning, A P Jason; Dokholyan, Nikolay V; Echave, Julian; Elofsson, Arne; Gerloff, Dietlind L; Goldstein, Richard A; Grahnen, Johan A; Holder, Mark T; Lakner, Clemens; Lartillot, Nicholas; Lovell, Simon C; Naylor, Gavin; Perica, Tina; Pollock, David D; Pupko, Tal; Regan, Lynne; Roger, Andrew; Rubinstein, Nimrod; Shakhnovich, Eugene; Sjölander, Kimmen; Sunyaev, Shamil; Teufel, Ashley I; Thorne, Jeffrey L; Thornton, Joseph W; Weinreich, Daniel M; Whelan, Simon

    2012-01-01

    Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction. PMID:22528593

  10. eMatchSite: sequence order-independent structure alignments of ligand binding pockets in protein models.

    Directory of Open Access Journals (Sweden)

    Michal Brylinski

    2014-09-01

    Full Text Available Detecting similarities between ligand binding sites in the absence of global homology between target proteins has been recognized as one of the critical components of modern drug discovery. Local binding site alignments can be constructed using sequence order-independent techniques, however, to achieve a high accuracy, many current algorithms for binding site comparison require high-quality experimental protein structures, preferably in the bound conformational state. This, in turn, complicates proteome scale applications, where only various quality structure models are available for the majority of gene products. To improve the state-of-the-art, we developed eMatchSite, a new method for constructing sequence order-independent alignments of ligand binding sites in protein models. Large-scale benchmarking calculations using adenine-binding pockets in crystal structures demonstrate that eMatchSite generates accurate alignments for almost three times more protein pairs than SOIPPA. More importantly, eMatchSite offers a high tolerance to structural distortions in ligand binding regions in protein models. For example, the percentage of correctly aligned pairs of adenine-binding sites in weakly homologous protein models is only 4-9% lower than those aligned using crystal structures. This represents a significant improvement over other algorithms, e.g. the performance of eMatchSite in recognizing similar binding sites is 6% and 13% higher than that of SiteEngine using high- and moderate-quality protein models, respectively. Constructing biologically correct alignments using predicted ligand binding sites in protein models opens up the possibility to investigate drug-protein interaction networks for complete proteomes with prospective systems-level applications in polypharmacology and rational drug repositioning. eMatchSite is freely available to the academic community as a web-server and a stand-alone software distribution at http://www.brylinski.org/ematchsite.

  11. SNP2Structure: A Public and Versatile Resource for Mapping and Three-Dimensional Modeling of Missense SNPs on Human Protein Structures

    Directory of Open Access Journals (Sweden)

    Difei Wang

    2015-01-01

    Full Text Available One of the long-standing challenges in biology is to understand how non-synonymous single nucleotide polymorphisms (nsSNPs change protein structure and further affect their function. While it is impractical to solve all the mutated protein structures experimentally, it is quite feasible to model the mutated structures in silico. Toward this goal, we built a publicly available structure database resource (SNP2Structure, https://apps.icbi.georgetown.edu/snp2structure focusing on missense mutations, msSNP. Compared with web portals with similar aims, SNP2Structure has the following major advantages. First, our portal offers direct comparison of two related 3D structures. Second, the protein models include all interacting molecules in the original PDB structures, so users are able to determine regions of potential interaction changes when a protein mutation occurs. Third, the mutated structures are available to download locally for further structural and functional analysis. Fourth, we used Jsmol package to display the protein structure that has no system compatibility issue. SNP2Structure provides reliable, high quality mapping of nsSNPs to 3D protein structures enabling researchers to explore the likely functional impact of human disease-causing mutations.

  12. Structure and non-structure of centrosomal proteins.

    Science.gov (United States)

    Dos Santos, Helena G; Abia, David; Janowski, Robert; Mortuza, Gulnahar; Bertero, Michela G; Boutin, Maïlys; Guarín, Nayibe; Méndez-Giraldez, Raúl; Nuñez, Alfonso; Pedrero, Juan G; Redondo, Pilar; Sanz, María; Speroni, Silvia; Teichert, Florian; Bruix, Marta; Carazo, José M; Gonzalez, Cayetano; Reina, José; Valpuesta, José M; Vernos, Isabelle; Zabala, Juan C; Montoya, Guillermo; Coll, Miquel; Bastolla, Ugo; Serrano, Luis

    2013-01-01

    Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.

  13. Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction

    Directory of Open Access Journals (Sweden)

    Seong-Gon Kim

    2011-06-01

    Full Text Available Several techniques such as Neural Networks, Genetic Algorithms, Decision Trees and other statistical or heuristic methods have been used to approach the complex non-linear task of predicting Alpha-helicies, Beta-sheets and Turns of a proteins secondary structure in the past. This project introduces a new machine learning method by using an offline trained Multilayered Perceptrons (MLP as the likelihood models within a Bayesian Inference framework to predict secondary structures proteins. Varying window sizes are used to extract neighboring amino acid information and passed back and forth between the Neural Net models and the Bayesian Inference process until there is a convergence of the posterior secondary structure probability.

  14. Modeling of the structure of ribosomal protein L1 from the archaeon Haloarcula marismortui

    Science.gov (United States)

    Nevskaya, N. A.; Kljashtorny, V. G.; Vakhrusheva, A. V.; Garber, M. B.; Nikonov, S. V.

    2017-07-01

    The halophilic archaeon Haloarcula marismortui proliferates in the Dead Sea at extremely high salt concentrations (higher than 3 M). This is the only archaeon, for which the crystal structure of the ribosomal 50S subunit was determined. However, the structure of the functionally important side protuberance containing the abnormally negatively charged protein L1 (HmaL1) was not visualized. Attempts to crystallize HmaL1 in the isolated state or as its complex with RNA using normal salt concentrations (≤500 mM) failed. A theoretical model of HmaL1 was built based on the structural data for homologs of the protein L1 from other organisms, and this model was refined by molecular dynamics methods. Analysis of this model showed that the protein HmaL1 can undergo aggregation due to the presence of a cluster of positive charges unique for proteins L1. This cluster is located at the RNA-protein interface, which interferes with the crystallization of HmaL1 and the binding of the latter to RNA.

  15. Structure modeling of all identified G protein-coupled receptors in the human genome.

    Science.gov (United States)

    Zhang, Yang; Devries, Mark E; Skolnick, Jeffrey

    2006-02-01

    G protein-coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global C(alpha) root-mean-squared deviation from native of 4.6 angstroms, with a root-mean-squared deviation in the transmembrane helix region of 2.1 angstroms. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness

  16. Structure modeling of all identified G protein-coupled receptors in the human genome.

    Directory of Open Access Journals (Sweden)

    Yang Zhang

    2006-02-01

    Full Text Available G protein-coupled receptors (GPCRs, encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global C(alpha root-mean-squared deviation from native of 4.6 angstroms, with a root-mean-squared deviation in the transmembrane helix region of 2.1 angstroms. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness

  17. Protein structural model selection by combining consensus and single scoring methods.

    Directory of Open Access Journals (Sweden)

    Zhiquan He

    Full Text Available Quality assessment (QA for predicted protein structural models is an important and challenging research problem in protein structure prediction. Consensus Global Distance Test (CGDT methods assess each decoy (predicted structural model based on its structural similarity to all others in a decoy set and has been proved to work well when good decoys are in a majority cluster. Scoring functions evaluate each single decoy based on its structural properties. Both methods have their merits and limitations. In this paper, we present a novel method called PWCom, which consists of two neural networks sequentially to combine CGDT and single model scoring methods such as RW, DDFire and OPUS-Ca. Specifically, for every pair of decoys, the difference of the corresponding feature vectors is input to the first neural network which enables one to predict whether the decoy-pair are significantly different in terms of their GDT scores to the native. If yes, the second neural network is used to decide which one of the two is closer to the native structure. The quality score for each decoy in the pool is based on the number of winning times during the pairwise comparisons. Test results on three benchmark datasets from different model generation methods showed that PWCom significantly improves over consensus GDT and single scoring methods. The QA server (MUFOLD-Server applying this method in CASP 10 QA category was ranked the second place in terms of Pearson and Spearman correlation performance.

  18. In Silico Characterization and Structural Modeling of Dermacentor andersoni p36 Immunosuppressive Protein

    Directory of Open Access Journals (Sweden)

    Martin Omulindi Oyugi

    2018-01-01

    Full Text Available Ticks cause approximately $17–19 billion economic losses to the livestock industry globally. Development of recombinant antitick vaccine is greatly hindered by insufficient knowledge and understanding of proteins expressed by ticks. Ticks secrete immunosuppressant proteins that modulate the host’s immune system during blood feeding; these molecules could be a target for antivector vaccine development. Recombinant p36, a 36 kDa immunosuppressor from the saliva of female Dermacentor andersoni, suppresses T-lymphocytes proliferation in vitro. To identify potential unique structural and dynamic properties responsible for the immunosuppressive function of p36 proteins, this study utilized bioinformatic tool to characterize and model structure of D. andersoni p36 protein. Evaluation of p36 protein family as suitable vaccine antigens predicted a p36 homolog in Rhipicephalus appendiculatus, the tick vector of East Coast fever, with an antigenicity score of 0.7701 that compares well with that of Bm86 (0.7681, the protein antigen that constitute commercial tick vaccine Tickgard™. Ab initio modeling of the D. andersoni p36 protein yielded a 3D structure that predicted conserved antigenic region, which has potential of binding immunomodulating ligands including glycerol and lactose, found located within exposed loop, suggesting a likely role in immunosuppressive function of tick p36 proteins. Laboratory confirmation of these preliminary results is necessary in future studies.

  19. Prediction of protein–protein interactions: unifying evolution and structure at protein interfaces

    International Nuclear Information System (INIS)

    Tuncbag, Nurcan; Gursoy, Attila; Keskin, Ozlem

    2011-01-01

    The vast majority of the chores in the living cell involve protein–protein interactions. Providing details of protein interactions at the residue level and incorporating them into protein interaction networks are crucial toward the elucidation of a dynamic picture of cells. Despite the rapid increase in the number of structurally known protein complexes, we are still far away from a complete network. Given experimental limitations, computational modeling of protein interactions is a prerequisite to proceed on the way to complete structural networks. In this work, we focus on the question 'how do proteins interact?' rather than 'which proteins interact?' and we review structure-based protein–protein interaction prediction approaches. As a sample approach for modeling protein interactions, PRISM is detailed which combines structural similarity and evolutionary conservation in protein interfaces to infer structures of complexes in the protein interaction network. This will ultimately help us to understand the role of protein interfaces in predicting bound conformations

  20. Protein structure modeling for CASP10 by multiple layers of global optimization.

    Science.gov (United States)

    Joo, Keehyoung; Lee, Juyong; Sim, Sangjin; Lee, Sun Young; Lee, Kiho; Heo, Seungryong; Lee, In-Ho; Lee, Sung Jong; Lee, Jooyoung

    2014-02-01

    In the template-based modeling (TBM) category of CASP10 experiment, we introduced a new protocol called protein modeling system (PMS) to generate accurate protein structures in terms of side-chains as well as backbone trace. In the new protocol, a global optimization algorithm, called conformational space annealing (CSA), is applied to the three layers of TBM procedure: multiple sequence-structure alignment, 3D chain building, and side-chain re-modeling. For 3D chain building, we developed a new energy function which includes new distance restraint terms of Lorentzian type (derived from multiple templates), and new energy terms that combine (physical) energy terms such as dynamic fragment assembly (DFA) energy, DFIRE statistical potential energy, hydrogen bonding term, etc. These physical energy terms are expected to guide the structure modeling especially for loop regions where no template structures are available. In addition, we developed a new quality assessment method based on random forest machine learning algorithm to screen templates, multiple alignments, and final models. For TBM targets of CASP10, we find that, due to the combination of three stages of CSA global optimizations and quality assessment, the modeling accuracy of PMS improves at each additional stage of the protocol. It is especially noteworthy that the side-chains of the final PMS models are far more accurate than the models in the intermediate steps. Copyright © 2013 Wiley Periodicals, Inc.

  1. MyPMFs: a simple tool for creating statistical potentials to assess protein structural models.

    Science.gov (United States)

    Postic, Guillaume; Hamelryck, Thomas; Chomilier, Jacques; Stratmann, Dirk

    2018-05-29

    Evaluating the model quality of protein structures that evolve in environments with particular physicochemical properties requires scoring functions that are adapted to their specific residue compositions and/or structural characteristics. Thus, computational methods developed for structures from the cytosol cannot work properly on membrane or secreted proteins. Here, we present MyPMFs, an easy-to-use tool that allows users to train statistical potentials of mean force (PMFs) on the protein structures of their choice, with all parameters being adjustable. We demonstrate its use by creating an accurate statistical potential for transmembrane protein domains. We also show its usefulness to study the influence of the physical environment on residue interactions within protein structures. Our open-source software is freely available for download at https://github.com/bibip-impmc/mypmfs. Copyright © 2018. Published by Elsevier B.V.

  2. Citrate synthase proteins in extremophilic organisms: Studies within a structure-based model

    International Nuclear Information System (INIS)

    Różycki, Bartosz; Cieplak, Marek

    2014-01-01

    We study four citrate synthase homodimeric proteins within a structure-based coarse-grained model. Two of these proteins come from thermophilic bacteria, one from a cryophilic bacterium and one from a mesophilic organism; three are in the closed and two in the open conformations. Even though the proteins belong to the same fold, the model distinguishes the properties of these proteins in a way which is consistent with experiments. For instance, the thermophilic proteins are more stable thermodynamically than their mesophilic and cryophilic homologues, which we observe both in the magnitude of thermal fluctuations near the native state and in the kinetics of thermal unfolding. The level of stability correlates with the average coordination number for amino acid contacts and with the degree of structural compactness. The pattern of positional fluctuations along the sequence in the closed conformation is different than in the open conformation, including within the active site. The modes of correlated and anticorrelated movements of pairs of amino acids forming the active site are very different in the open and closed conformations. Taken together, our results show that the precise location of amino acid contacts in the native structure appears to be a critical element in explaining the similarities and differences in the thermodynamic properties, local flexibility, and collective motions of the different forms of the enzyme

  3. NAPS: Network Analysis of Protein Structures

    Science.gov (United States)

    Chakrabarty, Broto; Parekh, Nita

    2016-01-01

    Traditionally, protein structures have been analysed by the secondary structure architecture and fold arrangement. An alternative approach that has shown promise is modelling proteins as a network of non-covalent interactions between amino acid residues. The network representation of proteins provide a systems approach to topological analysis of complex three-dimensional structures irrespective of secondary structure and fold type and provide insights into structure-function relationship. We have developed a web server for network based analysis of protein structures, NAPS, that facilitates quantitative and qualitative (visual) analysis of residue–residue interactions in: single chains, protein complex, modelled protein structures and trajectories (e.g. from molecular dynamics simulations). The user can specify atom type for network construction, distance range (in Å) and minimal amino acid separation along the sequence. NAPS provides users selection of node(s) and its neighbourhood based on centrality measures, physicochemical properties of amino acids or cluster of well-connected residues (k-cliques) for further analysis. Visual analysis of interacting domains and protein chains, and shortest path lengths between pair of residues are additional features that aid in functional analysis. NAPS support various analyses and visualization views for identifying functional residues, provide insight into mechanisms of protein folding, domain-domain and protein–protein interactions for understanding communication within and between proteins. URL:http://bioinf.iiit.ac.in/NAPS/. PMID:27151201

  4. Structural characterization of a recombinant fusion protein by instrumental analysis and molecular modeling.

    Directory of Open Access Journals (Sweden)

    Zhigang Wu

    Full Text Available Conbercept is a genetically engineered homodimeric protein for the treatment of wet age-related macular degeneration (wet AMD that functions by blocking VEGF-family proteins. Its huge, highly variable architecture makes characterization and development of a functional assay difficult. In this study, the primary structure, number of disulfide linkages and glycosylation state of conbercept were characterized by high-performance liquid chromatography, mass spectrometry, and capillary electrophoresis. Molecular modeling was then applied to obtain the spatial structural model of the conbercept-VEGF-A complex, and to study its inter-atomic interactions and dynamic behavior. This work was incorporated into a platform useful for studying the structure of conbercept and its ligand binding functions.

  5. Computational methods for constructing protein structure models from 3D electron microscopy maps.

    Science.gov (United States)

    Esquivel-Rodríguez, Juan; Kihara, Daisuke

    2013-10-01

    Protein structure determination by cryo-electron microscopy (EM) has made significant progress in the past decades. Resolutions of EM maps have been improving as evidenced by recently reported structures that are solved at high resolutions close to 3Å. Computational methods play a key role in interpreting EM data. Among many computational procedures applied to an EM map to obtain protein structure information, in this article we focus on reviewing computational methods that model protein three-dimensional (3D) structures from a 3D EM density map that is constructed from two-dimensional (2D) maps. The computational methods we discuss range from de novo methods, which identify structural elements in an EM map, to structure fitting methods, where known high resolution structures are fit into a low-resolution EM map. A list of available computational tools is also provided. Copyright © 2013 Elsevier Inc. All rights reserved.

  6. Compare local pocket and global protein structure models by small structure patterns

    KAUST Repository

    Cui, Xuefeng; Kuwahara, Hiroyuki; Li, Shuai Cheng; Gao, Xin

    2015-01-01

    Researchers proposed several criteria to assess the quality of predicted protein structures because it is one of the essential tasks in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competitions. Popular criteria

  7. An evolutionary model for protein-coding regions with conserved RNA structure

    DEFF Research Database (Denmark)

    Pedersen, Jakob Skou; Forsberg, Roald; Meyer, Irmtraud Margret

    2004-01-01

    in the RNA structure. The overlap of these fundamental dependencies is sufficient to cause "contagious" context dependencies which cascade across many nucleotide sites. Such large-scale dependencies challenge the use of traditional phylogenetic models in evolutionary inference because they explicitly assume...... components of traditional phylogenetic models. We applied this to a data set of full-genome sequences from the hepatitis C virus where five RNA structures are mapped within the coding region. This allowed us to partition the effects of selection on different structural elements and to test various hypotheses......Here we present a model of nucleotide substitution in protein-coding regions that also encode the formation of conserved RNA structures. In such regions, apparent evolutionary context dependencies exist, both between nucleotides occupying the same codon and between nucleotides forming a base pair...

  8. A Self-Assisting Protein Folding Model for Teaching Structural Molecular Biology.

    Science.gov (United States)

    Davenport, Jodi; Pique, Michael; Getzoff, Elizabeth; Huntoon, Jon; Gardner, Adam; Olson, Arthur

    2017-04-04

    Structural molecular biology is now becoming part of high school science curriculum thus posing a challenge for teachers who need to convey three-dimensional (3D) structures with conventional text and pictures. In many cases even interactive computer graphics does not go far enough to address these challenges. We have developed a flexible model of the polypeptide backbone using 3D printing technology. With this model we have produced a polypeptide assembly kit to create an idealized model of the Triosephosphate isomerase mutase enzyme (TIM), which forms a structure known as TIM barrel. This kit has been used in a laboratory practical where students perform a step-by-step investigation into the nature of protein folding, starting with the handedness of amino acids to the formation of secondary and tertiary structure. Based on the classroom evidence we collected, we conclude that these models are valuable and inexpensive resource for teaching structural molecular biology. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Modeling Protein Structures in Feed and Seed Tissues Using Novel Synchrotron-Based Analytical Technique

    International Nuclear Information System (INIS)

    Yu, P.

    2008-01-01

    Traditional 'wet' chemical analyses usually looks for a specific known component (such as protein) through homogenization and separation of the components of interest from the complex tissue matrix. Traditional 'wet' chemical analyses rely heavily on the use of harsh chemicals and derivatization, therefore altering the native feed protein structures and possibly generating artifacts. The objective of this study was to introduce a novel and non-destructive method to estimate protein structures in feed and seeds within intact tissues using advanced synchrotron-based infrared microspectroscopy (SFTIRM). The experiments were performed at the National Synchrotron Light Source in Brookhaven National Laboratory (US Dept. of Energy, NY). The results show that with synchrotron-based SFTIRM, we are able to localize relatively 'pure' protein without destructions of the feed and seed tissues and qualify protein internal structures in terms of the proportions and ratios of a-helix, β-sheet, random coil and β-turns on a relative basis using multi-peak modeling procedures. These protein structure profile (a-helix, β-sheet, etc.) may influence protein quality and availability in animals. Several examples of feed and seeds were provided. The implications of this study are that we can use this new method to compare internal protein structures between feeds and between seed verities. We can also use this method to detect heat-induced the structural changes of protein in feeds.

  10. SDSL-ESR-based protein structure characterization.

    Science.gov (United States)

    Strancar, Janez; Kavalenka, Aleh; Urbancic, Iztok; Ljubetic, Ajasja; Hemminga, Marcus A

    2010-03-01

    As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be seen in the number of protein structures published in the Protein Data Bank. This is especially the case for less-ordered, more hydrophobic and more flexible protein systems. The lack of efficient methods for structure determination calls for urgent development of a new class of biophysical techniques. This work attempts to address this problem with a novel combination of site-directed spin labelling electron spin resonance spectroscopy (SDSL-ESR) and protein structure modelling, which is coupled by restriction of the conformational spaces of the amino acid side chains. Comparison of the application to four different protein systems enables us to generalize the new method and to establish a general procedure for determination of protein structure.

  11. PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

    Directory of Open Access Journals (Sweden)

    Adeel Malik

    2010-01-01

    Full Text Available Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins as well as noncovalently (protein-carbohydrate complexes is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB, (ii Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

  12. RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

    Science.gov (United States)

    Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie

    2016-06-15

    Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by

  13. Exploring structural variability in X-ray crystallographic models using protein local optimization by torsion-angle sampling

    International Nuclear Information System (INIS)

    Knight, Jennifer L.; Zhou, Zhiyong; Gallicchio, Emilio; Himmel, Daniel M.; Friesner, Richard A.; Arnold, Eddy; Levy, Ronald M.

    2008-01-01

    Torsion-angle sampling, as implemented in the Protein Local Optimization Program (PLOP), is used to generate multiple structurally variable single-conformer models which are in good agreement with X-ray data. An ensemble-refinement approach to differentiate between positional uncertainty and conformational heterogeneity is proposed. Modeling structural variability is critical for understanding protein function and for modeling reliable targets for in silico docking experiments. Because of the time-intensive nature of manual X-ray crystallographic refinement, automated refinement methods that thoroughly explore conformational space are essential for the systematic construction of structurally variable models. Using five proteins spanning resolutions of 1.0–2.8 Å, it is demonstrated how torsion-angle sampling of backbone and side-chain libraries with filtering against both the chemical energy, using a modern effective potential, and the electron density, coupled with minimization of a reciprocal-space X-ray target function, can generate multiple structurally variable models which fit the X-ray data well. Torsion-angle sampling as implemented in the Protein Local Optimization Program (PLOP) has been used in this work. Models with the lowest R free values are obtained when electrostatic and implicit solvation terms are included in the effective potential. HIV-1 protease, calmodulin and SUMO-conjugating enzyme illustrate how variability in the ensemble of structures captures structural variability that is observed across multiple crystal structures and is linked to functional flexibility at hinge regions and binding interfaces. An ensemble-refinement procedure is proposed to differentiate between variability that is a consequence of physical conformational heterogeneity and that which reflects uncertainty in the atomic coordinates

  14. Association of protein structure, protein and carbohydrate subfractions with bioenergy profiles and biodegradation functions in modeled forage

    Science.gov (United States)

    Ji, Cuiying; Zhang, Xuewei; Yu, Peiqiang

    2016-03-01

    The objectives of this study were to detect unique aspects and association of forage protein inherent structure, biological compounds, protein and carbohydrate subfractions, bioenergy profiles, and biodegradation features. In this study, common available alfalfa hay from two different sourced-origins (FSO vs. CSO) was used as a modeled forage for inherent structure profile, bioenergy, biodegradation and their association between their structure and bio-functions. The molecular spectral profiles were determined using non-invasive molecular spectroscopy. The parameters included: protein structure amide I group, amide II group and their ratios; protein subfractions (PA1, PA2, PB1, PB2, PC); carbohydrate fractions (CA1, CA2, CA3, CA4, CB1, CB2, CC); biodegradable and undegradable fractions of protein (RDPA2, RDPB1, RDPB2, RDP; RUPA2 RUPB1, RUPB2, RUPC, RUP); biodegradable and undegradable fractions of carbohydrate (RDCA4, RDCB1, RDCB2, RDCB3, RDCHO; RUCA4, RUCB1; RUCB2; RUCB3 RUCC, RUCHO) and bioenergy profiles (tdNDF, tdFA, tdCP, tdNFC, TDN1 ×, DE3 ×, ME3 ×, NEL3 ×; NEm, NEg). The results show differences in protein and carbohydrate (CHO) subfractions in the moderately degradable true protein fraction (PB1: 502 vs. 420 g/kg CP, P = 0.09), slowly degraded true protein fraction (PB2: 45 vs. 96 g/kg CP, P = 0.02), moderately degradable CHO fraction (CB2: 283 vs. 223 g/kg CHO, P = 0.06) and slowly degraded CHO fraction (CB3: 369 vs. 408 g/kg CHO) between the two sourced origins. As to biodegradable (RD) fractions of protein and CHO in rumen, there were differences in RD of PB1 (417 vs. 349 g/kg CP, P = 0.09), RD of PB2 (29 vs. 62 g/kg CP, P = 0.02), RD of CB2 (251 vs. 198 g/kg DM, P = 0.06), RD of CB3 (236 vs. 261 g/kg CHO, P = 0.08). As to bioenergy profile, there were differences in total digestible nutrient (TDN: 551 vs. 537 g/kg DM, P = 0.06), and metabolic bioenergy (P = 0.095). As to protein molecular structure, there were differences in protein structure 1st

  15. A 3D model of the membrane protein complex formed by the white spot syndrome virus structural proteins.

    Directory of Open Access Journals (Sweden)

    Yun-Shiang Chang

    Full Text Available BACKGROUND: Outbreaks of white spot disease have had a large negative economic impact on cultured shrimp worldwide. However, the pathogenesis of the causative virus, WSSV (whit spot syndrome virus, is not yet well understood. WSSV is a large enveloped virus. The WSSV virion has three structural layers surrounding its core DNA: an outer envelope, a tegument and a nucleocapsid. In this study, we investigated the protein-protein interactions of the major WSSV structural proteins, including several envelope and tegument proteins that are known to be involved in the infection process. PRINCIPAL FINDINGS: In the present report, we used coimmunoprecipitation and yeast two-hybrid assays to elucidate and/or confirm all the interactions that occur among the WSSV structural (envelope and tegument proteins VP51A, VP19, VP24, VP26 and VP28. We found that VP51A interacted directly not only with VP26 but also with VP19 and VP24. VP51A, VP19 and VP24 were also shown to have an affinity for self-interaction. Chemical cross-linking assays showed that these three self-interacting proteins could occur as dimers. CONCLUSIONS: From our present results in conjunction with other previously established interactions we construct a 3D model in which VP24 acts as a core protein that directly associates with VP26, VP28, VP38A, VP51A and WSV010 to form a membrane-associated protein complex. VP19 and VP37 are attached to this complex via association with VP51A and VP28, respectively. Through the VP26-VP51C interaction this envelope complex is anchored to the nucleocapsid, which is made of layers of rings formed by VP664. A 3D model of the nucleocapsid and the surrounding outer membrane is presented.

  16. Modeling the Structure of SARS 3a Transmembrane Protein Using a ...

    Indian Academy of Sciences (India)

    Modeling the structure of SARS 3a Transmembrane protein using a ... for the implicit membrane molecular dynamics (MD) simulations. ... The coordinates during the simulation were saved every 500 steps, and were used for analysis. ... the pair list for calculation of nonbonded interactions being updated after every 10 steps.

  17. MOLECULAR MODELING INDICATES THAT HOMOCYSTEINE INDUCES CONFORMATIONAL CHANGES IN THE STRUCTURE OF PUTATIVE TARGET PROTEINS

    Directory of Open Access Journals (Sweden)

    Yumnam Silla

    2015-09-01

    Full Text Available An elevated level of homocysteine, a reactive thiol containing amino acid is associated with a multitude of complex diseases. A majority (>80% of homocysteine in circulation is bound to protein cysteine residues. Although, till date only 21 proteins have been experimentally shown to bind with homocysteine, using an insilico approach we had earlier identified several potential target proteins that could bind with homocysteine. Shomocysteinylation of proteins could potentially alter the structure and/or function of the protein. Earlier studies have shown that binding of homocysteine to protein alters its function. However, the effect of homocysteine on the target protein structure has not yet been documented. In the present work, we assess conformational or structural changes if any due to protein homocysteinylation using two proteins, granzyme B (GRAB and junctional adhesion molecule 1 (JAM1, which could potentially bind to homocysteine. We, for the first time, constructed computational models of homocysteine bound to target proteins and monitored their structural changes using explicit solvent molecular dynamic (MD simulation. Analysis of homocysteine bound trajectories revealed higher flexibility of the active site residues and local structural perturbations compared to the unbound native structure’s simulation, which could affect the stability of the protein. In addition, secondary structure analysis of homocysteine bound trajectories also revealed disappearance of â-helix within the G-helix and linker region that connects between the domain regions (as defined in the crystal structure. Our study thus captures the conformational transitions induced by homocysteine and we suggest these structural alterations might have implications for hyperhomocysteinemia induced pathologies.

  18. Chemical cross-linking and mass spectrometry for protein structural modeling

    NARCIS (Netherlands)

    Back, Jaap Willem; de Jong, Luitzen; Muijsers, Anton O.; de Koster, Chris G.

    2003-01-01

    The growth of gene and protein sequence information is currently so rapid that three-dimensional structural information is lacking for the overwhelming majority of known proteins. In this review, efforts towards rapid and sensitive methods for protein structural characterization are described,

  19. Constraint Logic Programming approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Fogolari Federico

    2004-11-01

    Full Text Available Abstract Background The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Results Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. Conclusions The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  20. Constraint Logic Programming approach to protein structure prediction.

    Science.gov (United States)

    Dal Palù, Alessandro; Dovier, Agostino; Fogolari, Federico

    2004-11-30

    The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known) secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  1. Constructing a folding model for protein S6 guided by native fluctuations deduced from NMR structures

    International Nuclear Information System (INIS)

    Lammert, Heiko; Noel, Jeffrey K.; Haglund, Ellinor; Onuchic, José N.; Schug, Alexander

    2015-01-01

    The diversity in a set of protein nuclear magnetic resonance (NMR) structures provides an estimate of native state fluctuations that can be used to refine and enrich structure-based protein models (SBMs). Dynamics are an essential part of a protein’s functional native state. The dynamics in the native state are controlled by the same funneled energy landscape that guides the entire folding process. SBMs apply the principle of minimal frustration, drawn from energy landscape theory, to construct a funneled folding landscape for a given protein using only information from the native structure. On an energy landscape smoothed by evolution towards minimal frustration, geometrical constraints, imposed by the native structure, control the folding mechanism and shape the native dynamics revealed by the model. Native-state fluctuations can alternatively be estimated directly from the diversity in the set of NMR structures for a protein. Based on this information, we identify a highly flexible loop in the ribosomal protein S6 and modify the contact map in a SBM to accommodate the inferred dynamics. By taking into account the probable native state dynamics, the experimental transition state is recovered in the model, and the correct order of folding events is restored. Our study highlights how the shared energy landscape connects folding and function by showing that a better description of the native basin improves the prediction of the folding mechanism

  2. Structure of liposome encapsulating proteins characterized by X-ray scattering and shell-modeling

    International Nuclear Information System (INIS)

    Hirai, Mitsuhiro; Kimura, Ryota; Takeuchi, Kazuki; Hagiwara, Yoshihiko; Kawai-Hirai, Rika; Ohta, Noboru; Igarashi, Noriyuki; Shimuzu, Nobutaka

    2013-01-01

    Wide-angle X-ray scattering data using a third-generation synchrotron radiation source are presented. Lipid liposomes are promising drug delivery systems because they have superior curative effects owing to their high adaptability to a living body. Lipid liposomes encapsulating proteins were constructed and the structures examined using synchrotron radiation small- and wide-angle X-ray scattering (SR-SWAXS). The liposomes were prepared by a sequential combination of natural swelling, ultrasonic dispersion, freeze-throw, extrusion and spin-filtration. The liposomes were composed of acidic glycosphingolipid (ganglioside), cholesterol and phospholipids. By using shell-modeling methods, the asymmetric bilayer structure of the liposome and the encapsulation efficiency of proteins were determined. As well as other analytical techniques, SR-SWAXS and shell-modeling methods are shown to be a powerful tool for characterizing in situ structures of lipid liposomes as an important candidate of drug delivery systems

  3. Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction.

    Science.gov (United States)

    Rysavy, Steven J; Beck, David A C; Daggett, Valerie

    2014-11-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.

  4. Bayesian comparison of protein structures using partial Procrustes distance.

    Science.gov (United States)

    Ejlali, Nasim; Faghihi, Mohammad Reza; Sadeghi, Mehdi

    2017-09-26

    An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures. These substructures are composed of β-carbon atoms from the side chains. Parameters are estimated using a Markov chain Monte Carlo (MCMC) approach. We evaluate the performance of our model through some simulation studies. Furthermore, we apply our model to a real dataset and assess the accuracy and convergence rate. Results show that our model is much more efficient than previous approaches.

  5. PSPP: a protein structure prediction pipeline for computing clusters.

    Directory of Open Access Journals (Sweden)

    Michael S Lee

    2009-07-01

    Full Text Available Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML formats. So far, the pipeline has been used to study viral and bacterial proteomes.The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform

  6. Structural deformation upon protein-protein interaction: a structural alphabet approach.

    Science.gov (United States)

    Martin, Juliette; Regad, Leslie; Lecornet, Hélène; Camproux, Anne-Claude

    2008-02-28

    In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Our study provides qualitative information about induced fit. These results could be of help for flexible docking.

  7. Structural deformation upon protein-protein interaction: A structural alphabet approach

    Directory of Open Access Journals (Sweden)

    Lecornet Hélène

    2008-02-01

    Full Text Available Abstract Background In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. Results In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%. This proportion is even greater in the interface regions (41%. Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Conclusion Our study provides qualitative information about induced fit. These results could be of help for flexible docking.

  8. Dynameomics: Data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction

    Science.gov (United States)

    Rysavy, Steven J; Beck, David AC; Daggett, Valerie

    2014-01-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412

  9. Comparative sequence and structural analyses of G-protein-coupled receptor crystal structures and implications for molecular models.

    Directory of Open Access Journals (Sweden)

    Catherine L Worth

    Full Text Available BACKGROUND: Up until recently the only available experimental (high resolution structure of a G-protein-coupled receptor (GPCR was that of bovine rhodopsin. In the past few years the determination of GPCR structures has accelerated with three new receptors, as well as squid rhodopsin, being successfully crystallized. All share a common molecular architecture of seven transmembrane helices and can therefore serve as templates for building molecular models of homologous GPCRs. However, despite the common general architecture of these structures key differences do exist between them. The choice of which experimental GPCR structure(s to use for building a comparative model of a particular GPCR is unclear and without detailed structural and sequence analyses, could be arbitrary. The aim of this study is therefore to perform a systematic and detailed analysis of sequence-structure relationships of known GPCR structures. METHODOLOGY: We analyzed in detail conserved and unique sequence motifs and structural features in experimentally-determined GPCR structures. Deeper insight into specific and important structural features of GPCRs as well as valuable information for template selection has been gained. Using key features a workflow has been formulated for identifying the most appropriate template(s for building homology models of GPCRs of unknown structure. This workflow was applied to a set of 14 human family A GPCRs suggesting for each the most appropriate template(s for building a comparative molecular model. CONCLUSIONS: The available crystal structures represent only a subset of all possible structural variation in family A GPCRs. Some GPCRs have structural features that are distributed over different crystal structures or which are not present in the templates suggesting that homology models should be built using multiple templates. This study provides a systematic analysis of GPCR crystal structures and a consistent method for identifying

  10. Comparative sequence and structural analyses of G-protein-coupled receptor crystal structures and implications for molecular models.

    Science.gov (United States)

    Worth, Catherine L; Kleinau, Gunnar; Krause, Gerd

    2009-09-16

    Up until recently the only available experimental (high resolution) structure of a G-protein-coupled receptor (GPCR) was that of bovine rhodopsin. In the past few years the determination of GPCR structures has accelerated with three new receptors, as well as squid rhodopsin, being successfully crystallized. All share a common molecular architecture of seven transmembrane helices and can therefore serve as templates for building molecular models of homologous GPCRs. However, despite the common general architecture of these structures key differences do exist between them. The choice of which experimental GPCR structure(s) to use for building a comparative model of a particular GPCR is unclear and without detailed structural and sequence analyses, could be arbitrary. The aim of this study is therefore to perform a systematic and detailed analysis of sequence-structure relationships of known GPCR structures. We analyzed in detail conserved and unique sequence motifs and structural features in experimentally-determined GPCR structures. Deeper insight into specific and important structural features of GPCRs as well as valuable information for template selection has been gained. Using key features a workflow has been formulated for identifying the most appropriate template(s) for building homology models of GPCRs of unknown structure. This workflow was applied to a set of 14 human family A GPCRs suggesting for each the most appropriate template(s) for building a comparative molecular model. The available crystal structures represent only a subset of all possible structural variation in family A GPCRs. Some GPCRs have structural features that are distributed over different crystal structures or which are not present in the templates suggesting that homology models should be built using multiple templates. This study provides a systematic analysis of GPCR crystal structures and a consistent method for identifying suitable templates for GPCR homology modelling that will

  11. Amino acid code of protein secondary structure.

    Science.gov (United States)

    Shestopalov, B V

    2003-01-01

    The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three

  12. De novo protein structure determination using sparse NMR data

    International Nuclear Information System (INIS)

    Bowers, Peter M.; Strauss, Charlie E.M.; Baker, David

    2000-01-01

    We describe a method for generating moderate to high-resolution protein structures using limited NMR data combined with the ab initio protein structure prediction method Rosetta. Peptide fragments are selected from proteins of known structure based on sequence similarity and consistency with chemical shift and NOE data. Models are built from these fragments by minimizing an energy function that favors hydrophobic burial, strand pairing, and satisfaction of NOE constraints. Models generated using this procedure with ∼1 NOE constraint per residue are in some cases closer to the corresponding X-ray structures than the published NMR solution structures. The method requires only the sparse constraints available during initial stages of NMR structure determination, and thus holds promise for increasing the speed with which protein solution structures can be determined

  13. Protein Molecular Structures, Protein SubFractions, and Protein Availability Affected by Heat Processing: A Review

    International Nuclear Information System (INIS)

    Yu, P.

    2007-01-01

    The utilization and availability of protein depended on the types of protein and their specific susceptibility to enzymatic hydrolysis (inhibitory activities) in the gastrointestine and was highly associated with protein molecular structures. Studying internal protein structure and protein subfraction profiles leaded to an understanding of the components that make up a whole protein. An understanding of the molecular structure of the whole protein was often vital to understanding its digestive behavior and nutritive value in animals. In this review, recently obtained information on protein molecular structural effects of heat processing was reviewed, in relation to protein characteristics affecting digestive behavior and nutrient utilization and availability. The emphasis of this review was on (1) using the newly advanced synchrotron technology (S-FTIR) as a novel approach to reveal protein molecular chemistry affected by heat processing within intact plant tissues; (2) revealing the effects of heat processing on the profile changes of protein subfractions associated with digestive behaviors and kinetics manipulated by heat processing; (3) prediction of the changes of protein availability and supply after heat processing, using the advanced DVE/OEB and NRC-2001 models, and (4) obtaining information on optimal processing conditions of protein as intestinal protein source to achieve target values for potential high net absorbable protein in the small intestine. The information described in this article may give better insight in the mechanisms involved and the intrinsic protein molecular structural changes occurring upon processing.

  14. Structure of human Rad51 protein filament from molecular modeling and site-specific linear dichroism spectroscopy

    KAUST Repository

    Reymer, A.

    2009-07-08

    To get mechanistic insight into the DNA strand-exchange reaction of homologous recombination, we solved a filament structure of a human Rad51 protein, combining molecular modeling with experimental data. We build our structure on reported structures for central and N-terminal parts of pure (uncomplexed) Rad51 protein by aid of linear dichroism spectroscopy, providing angular orientations of substituted tyrosine residues of Rad51-dsDNA filaments in solution. The structure, validated by comparison with an electron microscopy density map and results from mutation analysis, is proposed to represent an active solution structure of the nucleo-protein complex. An inhomogeneously stretched double-stranded DNA fitted into the filament emphasizes the strategic positioning of 2 putative DNA-binding loops in a way that allows us speculate about their possibly distinct roles in nucleo-protein filament assembly and DNA strand-exchange reaction. The model suggests that the extension of a single-stranded DNA molecule upon binding of Rad51 is ensured by intercalation of Tyr-232 of the L1 loop, which might act as a docking tool, aligning protein monomers along the DNA strand upon filament assembly. Arg-235, also sitting on L1, is in the right position to make electrostatic contact with the phosphate backbone of the other DNA strand. The L2 loop position and its more ordered compact conformation makes us propose that this loop has another role, as a binding site for an incoming double-stranded DNA. Our filament structure and spectroscopic approach open the possibility of analyzing details along the multistep path of the strand-exchange reaction.

  15. Structural studies of human glioma pathogenesis-related protein 1

    Energy Technology Data Exchange (ETDEWEB)

    Asojo, Oluwatoyin A., E-mail: oasojo@unmc.edu [College of Medicine, Nebraska Medical Center, Omaha, NE 68198-6495 (United States); Koski, Raymond A.; Bonafé, Nathalie [L2 Diagnostics LLC, 300 George Street, New Haven, CT 06511 (United States); College of Medicine, Nebraska Medical Center, Omaha, NE 68198-6495 (United States)

    2011-10-01

    Structural analysis of a truncated soluble domain of human glioma pathogenesis-related protein 1, a membrane protein implicated in the proliferation of aggressive brain cancer, is presented. Human glioma pathogenesis-related protein 1 (GLIPR1) is a membrane protein that is highly upregulated in brain cancers but is barely detectable in normal brain tissue. GLIPR1 is composed of a signal peptide that directs its secretion, a conserved cysteine-rich CAP (cysteine-rich secretory proteins, antigen 5 and pathogenesis-related 1 proteins) domain and a transmembrane domain. GLIPR1 is currently being investigated as a candidate for prostate cancer gene therapy and for glioblastoma targeted therapy. Crystal structures of a truncated soluble domain of the human GLIPR1 protein (sGLIPR1) solved by molecular replacement using a truncated polyalanine search model of the CAP domain of stecrisp, a snake-venom cysteine-rich secretory protein (CRISP), are presented. The correct molecular-replacement solution could only be obtained by removing all loops from the search model. The native structure was refined to 1.85 Å resolution and that of a Zn{sup 2+} complex was refined to 2.2 Å resolution. The latter structure revealed that the putative binding cavity coordinates Zn{sup 2+} similarly to snake-venom CRISPs, which are involved in Zn{sup 2+}-dependent mechanisms of inflammatory modulation. Both sGLIPR1 structures have extensive flexible loop/turn regions and unique charge distributions that were not observed in any of the previously reported CAP protein structures. A model is also proposed for the structure of full-length membrane-bound GLIPR1.

  16. Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools.

    Directory of Open Access Journals (Sweden)

    Lei Jia

    Full Text Available Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find "hot spots" in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html is a public database that consists of thousands of protein mutants' experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG and melting temperature change (dTm were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models.

  17. Phylogenetic analysis and protein structure modelling identifies distinct Ca(2+)/Cation antiporters and conservation of gene family structure within Arabidopsis and rice species.

    Science.gov (United States)

    Pittman, Jon K; Hirschi, Kendal D

    2016-12-01

    The Ca(2+)/Cation Antiporter (CaCA) superfamily is an ancient and widespread family of ion-coupled cation transporters found in nearly all kingdoms of life. In animals, K(+)-dependent and K(+)-indendent Na(+)/Ca(2+) exchangers (NCKX and NCX) are important CaCA members. Recently it was proposed that all rice and Arabidopsis CaCA proteins should be classified as NCX proteins. Here we performed phylogenetic analysis of CaCA genes and protein structure homology modelling to further characterise members of this transporter superfamily. Phylogenetic analysis of rice and Arabidopsis CaCAs in comparison with selected CaCA members from non-plant species demonstrated that these genes form clearly distinct families, with the H(+)/Cation exchanger (CAX) and cation/Ca(2+) exchanger (CCX) families dominant in higher plants but the NCKX and NCX families absent. NCX-related Mg(2+)/H(+) exchanger (MHX) and CAX-related Na(+)/Ca(2+) exchanger-like (NCL) proteins are instead present. Analysis of genomes of ten closely-related rice species and four Arabidopsis-related species found that CaCA gene family structures are highly conserved within related plants, apart from minor variation. Protein structures were modelled for OsCAX1a and OsMHX1. Despite exhibiting broad structural conservation, there are clear structural differences observed between the different CaCA types. Members of the CaCA superfamily form clearly distinct families with different phylogenetic, structural and functional characteristics, and therefore should not be simply classified as NCX proteins, which should remain as a separate gene family.

  18. VoroMQA: Assessment of protein structure quality using interatomic contact areas.

    Science.gov (United States)

    Olechnovič, Kliment; Venclovas, Česlovas

    2017-06-01

    In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge-based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue-residue or atom-atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation-based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single-model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa. Proteins 2017; 85:1131-1145. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  19. In silico modeling of the yeast protein and protein family interaction network

    Science.gov (United States)

    Goh, K.-I.; Kahng, B.; Kim, D.

    2004-03-01

    Understanding of how protein interaction networks of living organisms have evolved or are organized can be the first stepping stone in unveiling how life works on a fundamental ground. Here we introduce an in silico ``coevolutionary'' model for the protein interaction network and the protein family network. The essential ingredient of the model includes the protein family identity and its robustness under evolution, as well as the three previously proposed: gene duplication, divergence, and mutation. This model produces a prototypical feature of complex networks in a wide range of parameter space, following the generalized Pareto distribution in connectivity. Moreover, we investigate other structural properties of our model in detail with some specific values of parameters relevant to the yeast Saccharomyces cerevisiae, showing excellent agreement with the empirical data. Our model indicates that the physical constraints encoded via the domain structure of proteins play a crucial role in protein interactions.

  20. Protein structure recognition: From eigenvector analysis to structural threading method

    Science.gov (United States)

    Cao, Haibo

    In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.

  1. Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method

    Energy Technology Data Exchange (ETDEWEB)

    Cao, Haibo [Iowa State Univ., Ames, IA (United States)

    2003-01-01

    In this work, they try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. They found a strong correlation between amino acid sequences and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, they give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part includes discussions of interactions among amino acids residues, lattice HP model, and the design ability principle. In the second part, they try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in the eigenvector study of protein contact matrix. They believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, they discuss a threading method based on the correlation between amino acid sequences and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, they list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.

  2. Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method

    International Nuclear Information System (INIS)

    Haibo Cao

    2003-01-01

    In this work, they try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. They found a strong correlation between amino acid sequences and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, they give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part includes discussions of interactions among amino acids residues, lattice HP model, and the design ability principle. In the second part, they try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in the eigenvector study of protein contact matrix. They believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, they discuss a threading method based on the correlation between amino acid sequences and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, they list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches

  3. Optimal neural networks for protein-structure prediction

    International Nuclear Information System (INIS)

    Head-Gordon, T.; Stillinger, F.H.

    1993-01-01

    The successful application of neural-network algorithms for prediction of protein structure is stymied by three problem areas: the sparsity of the database of known protein structures, poorly devised network architectures which make the input-output mapping opaque, and a global optimization problem in the multiple-minima space of the network variables. We present a simplified polypeptide model residing in two dimensions with only two amino-acid types, A and B, which allows the determination of the global energy structure for all possible sequences of pentamer, hexamer, and heptamer lengths. This model simplicity allows us to compile a complete structural database and to devise neural networks that reproduce the tertiary structure of all sequences with absolute accuracy and with the smallest number of network variables. These optimal networks reveal that the three problem areas are convoluted, but that thoughtful network designs can actually deconvolute these detrimental traits to provide network algorithms that genuinely impact on the ability of the network to generalize or learn the desired mappings. Furthermore, the two-dimensional polypeptide model shows sufficient chemical complexity so that transfer of neural-network technology to more realistic three-dimensional proteins is evident

  4. Protein structure modeling and refinement by global optimization in CASP12.

    Science.gov (United States)

    Hong, Seung Hwan; Joung, InSuk; Flores-Canales, Jose C; Manavalan, Balachandran; Cheng, Qianyi; Heo, Seungryong; Kim, Jong Yun; Lee, Sun Young; Nam, Mikyung; Joo, Keehyoung; Lee, In-Ho; Lee, Sung Jong; Lee, Jooyoung

    2018-03-01

    For protein structure modeling in the CASP12 experiment, we have developed a new protocol based on our previous CASP11 approach. The global optimization method of conformational space annealing (CSA) was applied to 3 stages of modeling: multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain re-modeling. For better template selection and model selection, we updated our model quality assessment (QA) method with the newly developed SVMQA (support vector machine for quality assessment). For 3D chain building, we updated our energy function by including restraints generated from predicted residue-residue contacts. New energy terms for the predicted secondary structure and predicted solvent accessible surface area were also introduced. For difficult targets, we proposed a new method, LEEab, where the template term played a less significant role than it did in LEE, complemented by increased contributions from other terms such as the predicted contact term. For TBM (template-based modeling) targets, LEE performed better than LEEab, but for FM targets, LEEab was better. For model refinement, we modified our CASP11 molecular dynamics (MD) based protocol by using explicit solvents and tuning down restraint weights. Refinement results from MD simulations that used a new augmented statistical energy term in the force field were quite promising. Finally, when using inaccurate information (such as the predicted contacts), it was important to use the Lorentzian function for which the maximal penalty arising from wrong information is always bounded. © 2017 Wiley Periodicals, Inc.

  5. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field.

    Science.gov (United States)

    Xu, Dong; Zhang, Yang

    2012-07-01

    Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.

  6. The utility of comparative models and the local model quality for protein crystal structure determination by Molecular Replacement

    Directory of Open Access Journals (Sweden)

    Pawlowski Marcin

    2012-11-01

    Full Text Available Abstract Background Computational models of protein structures were proved to be useful as search models in Molecular Replacement (MR, a common method to solve the phase problem faced by macromolecular crystallography. The success of MR depends on the accuracy of a search model. Unfortunately, this parameter remains unknown until the final structure of the target protein is determined. During the last few years, several Model Quality Assessment Programs (MQAPs that predict the local accuracy of theoretical models have been developed. In this article, we analyze whether the application of MQAPs improves the utility of theoretical models in MR. Results For our dataset of 615 search models, the real local accuracy of a model increases the MR success ratio by 101% compared to corresponding polyalanine templates. On the contrary, when local model quality is not utilized in MR, the computational models solved only 4.5% more MR searches than polyalanine templates. For the same dataset of the 615 models, a workflow combining MR with predicted local accuracy of a model found 45% more correct solution than polyalanine templates. To predict such accuracy MetaMQAPclust, a “clustering MQAP” was used. Conclusions Using comparative models only marginally increases the MR success ratio in comparison to polyalanine structures of templates. However, the situation changes dramatically once comparative models are used together with their predicted local accuracy. A new functionality was added to the GeneSilico Fold Prediction Metaserver in order to build models that are more useful for MR searches. Additionally, we have developed a simple method, AmIgoMR (Am I good for MR?, to predict if an MR search with a template-based model for a given template is likely to find the correct solution.

  7. Structural model for the interaction of a designed Ankyrin Repeat Protein with the human epidermal growth factor receptor 2.

    Directory of Open Access Journals (Sweden)

    V Chandana Epa

    Full Text Available Designed Ankyrin Repeat Proteins are a class of novel binding proteins that can be selected and evolved to bind to targets with high affinity and specificity. We are interested in the DARPin H10-2-G3, which has been evolved to bind with very high affinity to the human epidermal growth factor receptor 2 (HER2. HER2 is found to be over-expressed in 30% of breast cancers, and is the target for the FDA-approved therapeutic monoclonal antibodies trastuzumab and pertuzumab and small molecule tyrosine kinase inhibitors. Here, we use computational macromolecular docking, coupled with several interface metrics such as shape complementarity, interaction energy, and electrostatic complementarity, to model the structure of the complex between the DARPin H10-2-G3 and HER2. We analyzed the interface between the two proteins and then validated the structural model by showing that selected HER2 point mutations at the putative interface with H10-2-G3 reduce the affinity of binding up to 100-fold without affecting the binding of trastuzumab. Comparisons made with a subsequently solved X-ray crystal structure of the complex yielded a backbone atom root mean square deviation of 0.84-1.14 Ångstroms. The study presented here demonstrates the capability of the computational techniques of structural bioinformatics in generating useful structural models of protein-protein interactions.

  8. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    Science.gov (United States)

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  9. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Chira Camelia

    2011-07-01

    Full Text Available Abstract Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  10. Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

    Science.gov (United States)

    Kinjo, Akira R; Nakamura, Haruki

    2013-01-01

    Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.

  11. Protein folding simulations: from coarse-grained model to all-atom model.

    Science.gov (United States)

    Zhang, Jian; Li, Wenfei; Wang, Jun; Qin, Meng; Wu, Lei; Yan, Zhiqiang; Xu, Weixin; Zuo, Guanghong; Wang, Wei

    2009-06-01

    Protein folding is an important and challenging problem in molecular biology. During the last two decades, molecular dynamics (MD) simulation has proved to be a paramount tool and was widely used to study protein structures, folding kinetics and thermodynamics, and structure-stability-function relationship. It was also used to help engineering and designing new proteins, and to answer even more general questions such as the minimal number of amino acid or the evolution principle of protein families. Nowadays, the MD simulation is still undergoing rapid developments. The first trend is to toward developing new coarse-grained models and studying larger and more complex molecular systems such as protein-protein complex and their assembling process, amyloid related aggregations, and structure and motion of chaperons, motors, channels and virus capsides; the second trend is toward building high resolution models and explore more detailed and accurate pictures of protein folding and the associated processes, such as the coordination bond or disulfide bond involved folding, the polarization, charge transfer and protonate/deprotonate process involved in metal coupled folding, and the ion permeation and its coupling with the kinetics of channels. On these new territories, MD simulations have given many promising results and will continue to offer exciting views. Here, we review several new subjects investigated by using MD simulations as well as the corresponding developments of appropriate protein models. These include but are not limited to the attempt to go beyond the topology based Gō-like model and characterize the energetic factors in protein structures and dynamics, the study of the thermodynamics and kinetics of disulfide bond involved protein folding, the modeling of the interactions between chaperonin and the encapsulated protein and the protein folding under this circumstance, the effort to clarify the important yet still elusive folding mechanism of protein BBL

  12. Protein structure database search and evolutionary classification.

    Science.gov (United States)

    Yang, Jinn-Moon; Tung, Chi-Hua

    2006-01-01

    As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].

  13. Topological properties of complex networks in protein structures

    Science.gov (United States)

    Kim, Kyungsik; Jung, Jae-Won; Min, Seungsik

    2014-03-01

    We study topological properties of networks in structural classification of proteins. We model the native-state protein structure as a network made of its constituent amino-acids and their interactions. We treat four structural classes of proteins composed predominantly of α helices and β sheets and consider several proteins from each of these classes whose sizes range from amino acids of the Protein Data Bank. Particularly, we simulate and analyze the network metrics such as the mean degree, the probability distribution of degree, the clustering coefficient, the characteristic path length, the local efficiency, and the cost. This work was supported by the KMAR and DP under Grant WISE project (153-3100-3133-302-350).

  14. Investigating energy-based pool structure selection in the structure ensemble modeling with experimental distance constraints: The example from a multidomain protein Pub1.

    Science.gov (United States)

    Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan

    2018-05-01

    The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.

  15. Protein structure determination by exhaustive search of Protein Data Bank derived databases.

    Science.gov (United States)

    Stokes-Rees, Ian; Sliz, Piotr

    2010-12-14

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

  16. Improved protein structure reconstruction using secondary structures, contacts at higher distance thresholds, and non-contacts.

    Science.gov (United States)

    Adhikari, Badri; Cheng, Jianlin

    2017-08-29

    Residue-residue contacts are key features for accurate de novo protein structure prediction. For the optimal utilization of these predicted contacts in folding proteins accurately, it is important to study the challenges of reconstructing protein structures using true contacts. Because contact-guided protein modeling approach is valuable for predicting the folds of proteins that do not have structural templates, it is necessary for reconstruction studies to focus on hard-to-predict protein structures. Using a data set consisting of 496 structural domains released in recent CASP experiments and a dataset of 150 representative protein structures, in this work, we discuss three techniques to improve the reconstruction accuracy using true contacts - adding secondary structures, increasing contact distance thresholds, and adding non-contacts. We find that reconstruction using secondary structures and contacts can deliver accuracy higher than using full contact maps. Similarly, we demonstrate that non-contacts can improve reconstruction accuracy not only when the used non-contacts are true but also when they are predicted. On the dataset consisting of 150 proteins, we find that by simply using low ranked predicted contacts as non-contacts and adding them as additional restraints, can increase the reconstruction accuracy by 5% when the reconstructed models are evaluated using TM-score. Our findings suggest that secondary structures are invaluable companions of contacts for accurate reconstruction. Confirming some earlier findings, we also find that larger distance thresholds are useful for folding many protein structures which cannot be folded using the standard definition of contacts. Our findings also suggest that for more accurate reconstruction using predicted contacts it is useful to predict contacts at higher distance thresholds (beyond 8 Å) and predict non-contacts.

  17. MolTalk--a programming library for protein structures and structure analysis.

    Science.gov (United States)

    Diemand, Alexander V; Scheib, Holger

    2004-04-19

    Two of the mostly unsolved but increasingly urgent problems for modern biologists are a) to quickly and easily analyse protein structures and b) to comprehensively mine the wealth of information, which is distributed along with the 3D co-ordinates by the Protein Data Bank (PDB). Tools which address this issue need to be highly flexible and powerful but at the same time must be freely available and easy to learn. We present MolTalk, an elaborate programming language, which consists of the programming library libmoltalk implemented in Objective-C and the Smalltalk-based interpreter MolTalk. MolTalk combines the advantages of an easy to learn and programmable procedural scripting with the flexibility and power of a full programming language. An overview of currently available applications of MolTalk is given and with PDBChainSaw one such application is described in more detail. PDBChainSaw is a MolTalk-based parser and information extraction utility of PDB files. Weekly updates of the PDB are synchronised with PDBChainSaw and are available for free download from the MolTalk project page http://www.moltalk.org following the link to PDBChainSaw. For each chain in a protein structure, PDBChainSaw extracts the sequence from its co-ordinates and provides additional information from the PDB-file header section, such as scientific organism, compound name, and EC code. MolTalk provides a rich set of methods to analyse and even modify experimentally determined or modelled protein structures. These methods vary in complexity and are thus suitable for beginners and advanced programmers alike. We envision MolTalk to be most valuable in the following applications:1) To analyse protein structures repetitively in large-scale, i.e. to benchmark protein structure prediction methods or to evaluate structural models. The quality of the resulting 3D-models can be assessed by e.g. calculating a Ramachandran-Sasisekharan plot.2) To quickly retrieve information for (a limited number of

  18. A Mesoscopic Model for Protein-Protein Interactions in Solution

    OpenAIRE

    Lund, Mikael; Jönsson, Bo

    2003-01-01

    Protein self-association may be detrimental in biological systems, but can be utilized in a controlled fashion for protein crystallization. It is hence of considerable interest to understand how factors like solution conditions prevent or promote aggregation. Here we present a computational model describing interactions between protein molecules in solution. The calculations are based on a molecular description capturing the detailed structure of the protein molecule using x-ray or nuclear ma...

  19. A probabilistic fragment-based protein structure prediction algorithm.

    Directory of Open Access Journals (Sweden)

    David Simoncini

    Full Text Available Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of [Formula: see text] proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold's decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software.html [corrected].

  20. In silico modelling and validation of differential expressed proteins in lung cancer

    Directory of Open Access Journals (Sweden)

    Bhagavathi S

    2012-05-01

    Full Text Available Objective: The present study aims predict the three dimensional structure of three major proteins responsible for causing Lung cancer. Methods: These are the differentially expressed proteins in lung cancer dataset. Initially, the structural template for these proteins is identified from structural database using homology search and perform homology modelling approach to predict its native 3D structure. Three-dimensional model obtained was validated using Ramachandran plot analysis to find the reliability of the model. Results: Four proteins were differentially expressed and were significant proteins in causing lung cancer. Among the four proteins, Matrixmetallo proteinase (P39900 had a known 3D structure and hence was not considered for modelling. The remaining proteins Polo like kinase I Q58A51, Trophinin B1AKF1, Thrombomodulin P07204 were modelled and validated. Conclusions: The three dimensional structure of proteins provides insights about the functional aspect and regulatory aspect of the protein. Thus, this study will be a breakthrough for further lung cancer related studies.

  1. Hidden Structural Codes in Protein Intrinsic Disorder.

    Science.gov (United States)

    Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo

    2017-10-17

    Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.

  2. Critical assessment of methods of protein structure prediction (CASP)-round IX

    KAUST Repository

    Moult, John; Fidelis, Krzysztof; Kryshtafovych, Andriy; Tramontano, Anna

    2011-01-01

    This article is an introduction to the special issue of the journal PROTEINS, dedicated to the ninth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Methods for modeling protein structure continue to advance, although at a more modest pace than in the early CASP experiments. CASP developments of note are indications of improvement in model accuracy for some classes of target, an improved ability to choose the most accurate of a set of generated models, and evidence of improvement in accuracy for short "new fold" models. In addition, a new analysis of regions of models not derivable from the most obvious template structure has revealed better performance than expected.

  3. Structural and Function Prediction of Musa acuminata subsp. Malaccensis Protein

    Directory of Open Access Journals (Sweden)

    Anum Munir

    2016-03-01

    Full Text Available Hypothetical proteins (HPs are the proteins whose presence has been anticipated, yet in vivo function has not been built up. Illustrating the structural and functional privileged insights of these HPs might likewise prompt a superior comprehension of the protein-protein associations or networks in diverse types of life. Bananas (Musa acuminata spp., including sweet and cooking types, are giant perennial monocotyledonous herbs of the order Zingiberales, a sister grouped to the all-around considered Poales, which incorporate oats. Bananas are crucial for nourishment security in numerous tropical and subtropical nations and the most prominent organic product in industrialized nations. In the present study, the hypothetical protein of M. acuminata (Banana was chosen for analysis and modeling by distinctive bioinformatics apparatuses and databases. As indicated by primary and secondary structure analysis, XP_009393594.1 is a stable hydrophobic protein containing a noteworthy extent of α-helices; Homology modeling was done utilizing SWISS-MODEL server where the templates identity with XP_009393594.1 protein was less which demonstrated novelty of our protein. Ab initio strategy was conducted to produce its 3D structure. A few evaluations of quality assessment and validation parameters determined the generated protein model as stable with genuinely great quality. Functional analysis was completed by ProtFun 2.2, and KEGG (KAAS, recommended that the hypothetical protein is a transcription factor with cytoplasmic domain as zinc finger. The protein was observed to be vital for translation process, involved in metabolism, signaling and cellular processes, genetic information processing and Zinc ion binding. It is suggested that further test approval would help to anticipate the structures and functions of other uncharacterized proteins of different plants and living being.

  4. Global Structural Flexibility of Metalloproteins Regulates Reactivity of Transition Metal Ion in the Protein Core: An Experimental Study Using Thiol-subtilisin as a Model Protein.

    Science.gov (United States)

    Matsuo, Takashi; Kono, Takamasa; Shobu, Isamu; Ishida, Masaya; Gonda, Katsuya; Hirota, Shun

    2018-02-21

    The functions of metal-containing proteins (metalloproteins) are determined by the reactivities of transition metal ions at their active sites. Because protein macromolecular structures have several molecular degrees of freedom, global structural flexibility may also regulate the properties of metalloproteins. However, the influence of this factor has not been fully delineated in mechanistic studies of metalloproteins. Accordingly, we have investigated the relationship between global protein flexibility and the characteristics of a transition metal ion in the protein core using thiol-subtilisin (tSTL) with a Cys-coordinated Cu 2+ ion as a model system. Although tSTL has two Ca 2+ -binding sites, the Ca 2+ -binding status hardly affects its secondary structure. Nevertheless, guanidinium-induced denaturation and amide H/D exchange indicated the increase in the structural flexibility of tSTL by the removal of bound Ca 2+ ions. Electron paramagnetic resonance and absorption spectral changes have revealed that the protein flexibility determines the characteristics of a Cu 2+ ion in tSTL. Therefore, global protein flexibility should be recognized as an important factor that regulates the properties of metalloproteins. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Molecular modelling of the Norrie disease protein predicts a cystine knot growth factor tertiary structure.

    Science.gov (United States)

    Meitinger, T; Meindl, A; Bork, P; Rost, B; Sander, C; Haasemann, M; Murken, J

    1993-12-01

    The X-lined gene for Norrie disease, which is characterized by blindness, deafness and mental retardation has been cloned recently. This gene has been thought to code for a putative extracellular factor; its predicted amino acid sequence is homologous to the C-terminal domain of diverse extracellular proteins. Sequence pattern searches and three-dimensional modelling now suggest that the Norrie disease protein (NDP) has a tertiary structure similar to that of transforming growth factor beta (TGF beta). Our model identifies NDP as a member of an emerging family of growth factors containing a cystine knot motif, with direct implications for the physiological role of NDP. The model also sheds light on sequence related domains such as the C-terminal domain of mucins and of von Willebrand factor.

  6. Oligomeric protein structure networks: insights into protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Brinda KV

    2005-12-01

    Full Text Available Abstract Background Protein-protein association is essential for a variety of cellular processes and hence a large number of investigations are being carried out to understand the principles of protein-protein interactions. In this study, oligomeric protein structures are viewed from a network perspective to obtain new insights into protein association. Structure graphs of proteins have been constructed from a non-redundant set of protein oligomer crystal structures by considering amino acid residues as nodes and the edges are based on the strength of the non-covalent interactions between the residues. The analysis of such networks has been carried out in terms of amino acid clusters and hubs (highly connected residues with special emphasis to protein interfaces. Results A variety of interactions such as hydrogen bond, salt bridges, aromatic and hydrophobic interactions, which occur at the interfaces are identified in a consolidated manner as amino acid clusters at the interface, from this study. Moreover, the characterization of the highly connected hub-forming residues at the interfaces and their comparison with the hubs from the non-interface regions and the non-hubs in the interface regions show that there is a predominance of charged interactions at the interfaces. Further, strong and weak interfaces are identified on the basis of the interaction strength between amino acid residues and the sizes of the interface clusters, which also show that many protein interfaces are stronger than their monomeric protein cores. The interface strengths evaluated based on the interface clusters and hubs also correlate well with experimentally determined dissociation constants for known complexes. Finally, the interface hubs identified using the present method correlate very well with experimentally determined hotspots in the interfaces of protein complexes obtained from the Alanine Scanning Energetics database (ASEdb. A few predictions of interface hot

  7. Scoring predictive models using a reduced representation of proteins: model and energy definition.

    Science.gov (United States)

    Fogolari, Federico; Pieri, Lidia; Dovier, Agostino; Bortolussi, Luca; Giugliarelli, Gilberto; Corazza, Alessandra; Esposito, Gennaro; Viglino, Paolo

    2007-03-23

    Reduced representations of proteins have been playing a keyrole in the study of protein folding. Many such models are available, with different representation detail. Although the usefulness of many such models for structural bioinformatics applications has been demonstrated in recent years, there are few intermediate resolution models endowed with an energy model capable, for instance, of detecting native or native-like structures among decoy sets. The aim of the present work is to provide a discrete empirical potential for a reduced protein model termed here PC2CA, because it employs a PseudoCovalent structure with only 2 Centers of interactions per Amino acid, suitable for protein model quality assessment. All protein structures in the set top500H have been converted in reduced form. The distribution of pseudobonds, pseudoangle, pseudodihedrals and distances between centers of interactions have been converted into potentials of mean force. A suitable reference distribution has been defined for non-bonded interactions which takes into account excluded volume effects and protein finite size. The correlation between adjacent main chain pseudodihedrals has been converted in an additional energetic term which is able to account for cooperative effects in secondary structure elements. Local energy surface exploration is performed in order to increase the robustness of the energy function. The model and the energy definition proposed have been tested on all the multiple decoys' sets in the Decoys'R'us database. The energetic model is able to recognize, for almost all sets, native-like structures (RMSD less than 2.0 A). These results and those obtained in the blind CASP7 quality assessment experiment suggest that the model compares well with scoring potentials with finer granularity and could be useful for fast exploration of conformational space. Parameters are available at the url: http://www.dstb.uniud.it/~ffogolari/download/.

  8. Modularity in protein structures: study on all-alpha proteins.

    Science.gov (United States)

    Khan, Taushif; Ghosh, Indira

    2015-01-01

    Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.

  9. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships.

    KAUST Repository

    Kosinski, Jan; Barbato, Alessandro; Tramontano, Anna

    2013-01-01

    SUMMARY: MODexplorer is an integrated tool aimed at exploring the sequence, structural and functional diversity in protein families useful in homology modeling and in analyzing protein families in general. It takes as input either the sequence or the structure of a protein and provides alignments with its homologs along with a variety of structural and functional annotations through an interactive interface. The annotations include sequence conservation, similarity scores, ligand-, DNA- and RNA-binding sites, secondary structure, disorder, crystallographic structure resolution and quality scores of models implied by the alignments to the homologs of known structure. MODexplorer can be used to analyze sequence and structural conservation among the structures of similar proteins, to find structures of homologs solved in different conformational state or with different ligands and to transfer functional annotations. Furthermore, if the structure of the query is not known, MODexplorer can be used to select the modeling templates taking all this information into account and to build a comparative model. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modexplorer. Website implemented in HTML and JavaScript with all major browsers supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  10. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships.

    KAUST Repository

    Kosinski, Jan

    2013-02-08

    SUMMARY: MODexplorer is an integrated tool aimed at exploring the sequence, structural and functional diversity in protein families useful in homology modeling and in analyzing protein families in general. It takes as input either the sequence or the structure of a protein and provides alignments with its homologs along with a variety of structural and functional annotations through an interactive interface. The annotations include sequence conservation, similarity scores, ligand-, DNA- and RNA-binding sites, secondary structure, disorder, crystallographic structure resolution and quality scores of models implied by the alignments to the homologs of known structure. MODexplorer can be used to analyze sequence and structural conservation among the structures of similar proteins, to find structures of homologs solved in different conformational state or with different ligands and to transfer functional annotations. Furthermore, if the structure of the query is not known, MODexplorer can be used to select the modeling templates taking all this information into account and to build a comparative model. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://modorama.biocomputing.it/modexplorer. Website implemented in HTML and JavaScript with all major browsers supported. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  11. Fragger: a protein fragment picker for structural queries.

    Science.gov (United States)

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  12. Pushing the frontiers of atomic models for protein tertiary structure ...

    Indian Academy of Sciences (India)

    as an NP complete or NP hard problem.4,5 This notwith- standing, the dire need for tertiary structures of proteins in drug discovery and other areas6–8 has propelled the development of a multitude of computational recipes. In this article, we focus on ab initio/de novo strategies,. Bhageerath in particular, for protein tertiary ...

  13. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  14. Accuracy issues involved in modeling in vivo protein structures using PM7.

    Science.gov (United States)

    Martin, Benjamin P; Brandon, Christopher J; Stewart, James J P; Braun-Sand, Sonja B

    2015-08-01

    Using the semiempirical method PM7, an attempt has been made to quantify the error in prediction of the in vivo structure of proteins relative to X-ray structures. Three important contributory factors are the experimental limitations of X-ray structures, the difference between the crystal and solution environments, and the errors due to PM7. The geometries of 19 proteins from the Protein Data Bank that had small R values, that is, high accuracy structures, were optimized and the resulting drop in heat of formation was calculated. Analysis of the changes showed that about 10% of this decrease in heat of formation was caused by faults in PM7, the balance being attributable to the X-ray structure and the difference between the crystal and solution environments. A previously unknown fault in PM7 was revealed during tests to validate the geometries generated using PM7. Clashscores generated by the Molprobity molecular mechanics structure validation program showed that PM7 was predicting unrealistically close contacts between nonbonding atoms in regions where the local geometry is dominated by very weak noncovalent interactions. The origin of this fault was traced to an underestimation of the core-core repulsion between atoms at distances smaller than the equilibrium distance. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published By Wiley Periodicals, Inc.

  15. Structure of synaptophysin: a hexameric MARVEL-domain channel protein.

    Science.gov (United States)

    Arthur, Christopher P; Stowell, Michael H B

    2007-06-01

    Synaptophysin I (SypI) is an archetypal member of the MARVEL-domain family of integral membrane proteins and one of the first synaptic vesicle proteins to be identified and cloned. Most all MARVEL-domain proteins are involved in membrane apposition and vesicle-trafficking events, but their precise role in these processes is unclear. We have purified mammalian SypI and determined its three-dimensional (3D) structure by using electron microscopy and single-particle 3D reconstruction. The hexameric structure resembles an open basket with a large pore and tenuous interactions within the cytosolic domain. The structure suggests a model for Synaptophysin's role in fusion and recycling that is regulated by known interactions with the SNARE machinery. This 3D structure of a MARVEL-domain protein provides a structural foundation for understanding the role of these important proteins in a variety of biological processes.

  16. MolTalk – a programming library for protein structures and structure analysis

    Science.gov (United States)

    Diemand, Alexander V; Scheib, Holger

    2004-01-01

    Background Two of the mostly unsolved but increasingly urgent problems for modern biologists are a) to quickly and easily analyse protein structures and b) to comprehensively mine the wealth of information, which is distributed along with the 3D co-ordinates by the Protein Data Bank (PDB). Tools which address this issue need to be highly flexible and powerful but at the same time must be freely available and easy to learn. Results We present MolTalk, an elaborate programming language, which consists of the programming library libmoltalk implemented in Objective-C and the Smalltalk-based interpreter MolTalk. MolTalk combines the advantages of an easy to learn and programmable procedural scripting with the flexibility and power of a full programming language. An overview of currently available applications of MolTalk is given and with PDBChainSaw one such application is described in more detail. PDBChainSaw is a MolTalk-based parser and information extraction utility of PDB files. Weekly updates of the PDB are synchronised with PDBChainSaw and are available for free download from the MolTalk project page following the link to PDBChainSaw. For each chain in a protein structure, PDBChainSaw extracts the sequence from its co-ordinates and provides additional information from the PDB-file header section, such as scientific organism, compound name, and EC code. Conclusion MolTalk provides a rich set of methods to analyse and even modify experimentally determined or modelled protein structures. These methods vary in complexity and are thus suitable for beginners and advanced programmers alike. We envision MolTalk to be most valuable in the following applications: 1) To analyse protein structures repetitively in large-scale, i.e. to benchmark protein structure prediction methods or to evaluate structural models. The quality of the resulting 3D-models can be assessed by e.g. calculating a Ramachandran-Sasisekharan plot. 2) To quickly retrieve information for (a limited

  17. MolTalk – a programming library for protein structures and structure analysis

    Directory of Open Access Journals (Sweden)

    Diemand Alexander V

    2004-04-01

    Full Text Available Abstract Background Two of the mostly unsolved but increasingly urgent problems for modern biologists are a to quickly and easily analyse protein structures and b to comprehensively mine the wealth of information, which is distributed along with the 3D co-ordinates by the Protein Data Bank (PDB. Tools which address this issue need to be highly flexible and powerful but at the same time must be freely available and easy to learn. Results We present MolTalk, an elaborate programming language, which consists of the programming library libmoltalk implemented in Objective-C and the Smalltalk-based interpreter MolTalk. MolTalk combines the advantages of an easy to learn and programmable procedural scripting with the flexibility and power of a full programming language. An overview of currently available applications of MolTalk is given and with PDBChainSaw one such application is described in more detail. PDBChainSaw is a MolTalk-based parser and information extraction utility of PDB files. Weekly updates of the PDB are synchronised with PDBChainSaw and are available for free download from the MolTalk project page http://www.moltalk.org following the link to PDBChainSaw. For each chain in a protein structure, PDBChainSaw extracts the sequence from its co-ordinates and provides additional information from the PDB-file header section, such as scientific organism, compound name, and EC code. Conclusion MolTalk provides a rich set of methods to analyse and even modify experimentally determined or modelled protein structures. These methods vary in complexity and are thus suitable for beginners and advanced programmers alike. We envision MolTalk to be most valuable in the following applications: 1 To analyse protein structures repetitively in large-scale, i.e. to benchmark protein structure prediction methods or to evaluate structural models. The quality of the resulting 3D-models can be assessed by e.g. calculating a Ramachandran-Sasisekharan plot. 2 To

  18. Protein 3D structure computed from evolutionary sequence variation.

    Directory of Open Access Journals (Sweden)

    Debora S Marks

    Full Text Available The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org. This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of

  19. New tips for structure prediction by comparative modeling

    Science.gov (United States)

    Rayan, Anwar

    2009-01-01

    Comparative modelling is utilized to predict the 3-dimensional conformation of a given protein (target) based on its sequence alignment to experimentally determined protein structure (template). The use of such technique is already rewarding and increasingly widespread in biological research and drug development. The accuracy of the predictions as commonly accepted depends on the score of sequence identity of the target protein to the template. To assess the relationship between sequence identity and model quality, we carried out an analysis of a set of 4753 sequence and structure alignments. Throughout this research, the model accuracy was measured by root mean square deviations of Cα atoms of the target-template structures. Surprisingly, the results show that sequence identity of the target protein to the template is not a good descriptor to predict the accuracy of the 3-D structure model. However, in a large number of cases, comparative modelling with lower sequence identity of target to template proteins led to more accurate 3-D structure model. As a consequence of this study, we suggest new tips for improving the quality of omparative models, particularly for models whose target-template sequence identity is below 50%. PMID:19255646

  20. The Puf family of RNA-binding proteins in plants: phylogeny, structural modeling, activity and subcellular localization

    Directory of Open Access Journals (Sweden)

    Tam Michael WC

    2010-03-01

    Full Text Available Abstract Background Puf proteins have important roles in controlling gene expression at the post-transcriptional level by promoting RNA decay and repressing translation. The Pumilio homology domain (PUM-HD is a conserved region within Puf proteins that binds to RNA with sequence specificity. Although Puf proteins have been well characterized in animal and fungal systems, little is known about the structural and functional characteristics of Puf-like proteins in plants. Results The Arabidopsis and rice genomes code for 26 and 19 Puf-like proteins, respectively, each possessing eight or fewer Puf repeats in their PUM-HD. Key amino acids in the PUM-HD of several of these proteins are conserved with those of animal and fungal homologs, whereas other plant Puf proteins demonstrate extensive variability in these amino acids. Three-dimensional modeling revealed that the predicted structure of this domain in plant Puf proteins provides a suitable surface for binding RNA. Electrophoretic gel mobility shift experiments showed that the Arabidopsis AtPum2 PUM-HD binds with high affinity to BoxB of the Drosophila Nanos Response Element I (NRE1 RNA, whereas a point mutation in the core of the NRE1 resulted in a significant reduction in binding affinity. Transient expression of several of the Arabidopsis Puf proteins as fluorescent protein fusions revealed a dynamic, punctate cytoplasmic pattern of localization for most of these proteins. The presence of predicted nuclear export signals and accumulation of AtPuf proteins in the nucleus after treatment of cells with leptomycin B demonstrated that shuttling of these proteins between the cytosol and nucleus is common among these proteins. In addition to the cytoplasmically enriched AtPum proteins, two AtPum proteins showed nuclear targeting with enrichment in the nucleolus. Conclusions The Puf family of RNA-binding proteins in plants consists of a greater number of members than any other model species studied to

  1. GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure.

    Science.gov (United States)

    Baek, Minkyung; Park, Taeyong; Heo, Lim; Park, Chiwook; Seok, Chaok

    2017-07-03

    Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    Directory of Open Access Journals (Sweden)

    Che-Lun Hung

    2013-01-01

    Full Text Available Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

  3. Protein single-model quality assessment by feature-based probability density functions.

    Science.gov (United States)

    Cao, Renzhi; Cheng, Jianlin

    2016-04-04

    Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.

  4. Integrating protein structures and precomputed genealogies in the Magnum database: Examples with cellular retinoid binding proteins

    Directory of Open Access Journals (Sweden)

    Bradley Michael E

    2006-02-01

    Full Text Available Abstract Background When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use. Results The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1 multiple sequence alignments, 2 mapping of alignment sites to crystal structure sites, 3 phylogenetic trees, 4 inferred ancestral sequences at internal tree nodes, and 5 amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures. Conclusion We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural

  5. Modeling membrane protein structure through site-directed ESR spectroscopy

    NARCIS (Netherlands)

    Kavalenka, A.A.

    2009-01-01

    Site-directed spin labeling (SDSL) electron spin resonance (ESR) spectroscopy is a
    relatively new biophysical tool for obtaining structural information about proteins. This
    thesis presents a novel approach, based on powerful spectral analysis techniques (multicomponent
    spectral

  6. Hydrogen atoms in protein structures: high-resolution X-ray diffraction structure of the DFPase

    Science.gov (United States)

    2013-01-01

    Background Hydrogen atoms represent about half of the total number of atoms in proteins and are often involved in substrate recognition and catalysis. Unfortunately, X-ray protein crystallography at usual resolution fails to access directly their positioning, mainly because light atoms display weak contributions to diffraction. However, sub-Ångstrom diffraction data, careful modeling and a proper refinement strategy can allow the positioning of a significant part of hydrogen atoms. Results A comprehensive study on the X-ray structure of the diisopropyl-fluorophosphatase (DFPase) was performed, and the hydrogen atoms were modeled, including those of solvent molecules. This model was compared to the available neutron structure of DFPase, and differences in the protein and the active site solvation were noticed. Conclusions A further examination of the DFPase X-ray structure provides substantial evidence about the presence of an activated water molecule that may constitute an interesting piece of information as regard to the enzymatic hydrolysis mechanism. PMID:23915572

  7. Docking-based modeling of protein-protein interfaces for extensive structural and functional characterization of missense mutations.

    Science.gov (United States)

    Barradas-Bautista, Didier; Fernández-Recio, Juan

    2017-01-01

    Next-generation sequencing (NGS) technologies are providing genomic information for an increasing number of healthy individuals and patient populations. In the context of the large amount of generated genomic data that is being generated, understanding the effect of disease-related mutations at molecular level can contribute to close the gap between genotype and phenotype and thus improve prevention, diagnosis or treatment of a pathological condition. In order to fully characterize the effect of a pathological mutation and have useful information for prediction purposes, it is important first to identify whether the mutation is located at a protein-binding interface, and second to understand the effect on the binding affinity of the affected interaction/s. Computational methods, such as protein docking are currently used to complement experimental efforts and could help to build the human structural interactome. Here we have extended the original pyDockNIP method to predict the location of disease-associated nsSNPs at protein-protein interfaces, when there is no available structure for the protein-protein complex. We have applied this approach to the pathological interaction networks of six diseases with low structural data on PPIs. This approach can almost double the number of nsSNPs that can be characterized and identify edgetic effects in many nsSNPs that were previously unknown. This can help to annotate and interpret genomic data from large-scale population studies, and to achieve a better understanding of disease at molecular level.

  8. Docking-based modeling of protein-protein interfaces for extensive structural and functional characterization of missense mutations.

    Directory of Open Access Journals (Sweden)

    Didier Barradas-Bautista

    Full Text Available Next-generation sequencing (NGS technologies are providing genomic information for an increasing number of healthy individuals and patient populations. In the context of the large amount of generated genomic data that is being generated, understanding the effect of disease-related mutations at molecular level can contribute to close the gap between genotype and phenotype and thus improve prevention, diagnosis or treatment of a pathological condition. In order to fully characterize the effect of a pathological mutation and have useful information for prediction purposes, it is important first to identify whether the mutation is located at a protein-binding interface, and second to understand the effect on the binding affinity of the affected interaction/s. Computational methods, such as protein docking are currently used to complement experimental efforts and could help to build the human structural interactome. Here we have extended the original pyDockNIP method to predict the location of disease-associated nsSNPs at protein-protein interfaces, when there is no available structure for the protein-protein complex. We have applied this approach to the pathological interaction networks of six diseases with low structural data on PPIs. This approach can almost double the number of nsSNPs that can be characterized and identify edgetic effects in many nsSNPs that were previously unknown. This can help to annotate and interpret genomic data from large-scale population studies, and to achieve a better understanding of disease at molecular level.

  9. Protein enriched pasta: structure and digestibility of its protein network.

    Science.gov (United States)

    Laleg, Karima; Barron, Cécile; Santé-Lhoutellier, Véronique; Walrand, Stéphane; Micard, Valérie

    2016-02-01

    Wheat (W) pasta was enriched in 6% gluten (G), 35% faba (F) or 5% egg (E) to increase its protein content (13% to 17%). The impact of the enrichment on the multiscale structure of the pasta and on in vitro protein digestibility was studied. Increasing the protein content (W- vs. G-pasta) strengthened pasta structure at molecular and macroscopic scales but reduced its protein digestibility by 3% by forming a higher covalently linked protein network. Greater changes in the macroscopic and molecular structure of the pasta were obtained by varying the nature of protein used for enrichment. Proteins in G- and E-pasta were highly covalently linked (28-32%) resulting in a strong pasta structure. Conversely, F-protein (98% SDS-soluble) altered the pasta structure by diluting gluten and formed a weak protein network (18% covalent link). As a result, protein digestibility in F-pasta was significantly higher (46%) than in E- (44%) and G-pasta (39%). The effect of low (55 °C, LT) vs. very high temperature (90 °C, VHT) drying on the protein network structure and digestibility was shown to cause greater molecular changes than pasta formulation. Whatever the pasta, a general strengthening of its structure, a 33% to 47% increase in covalently linked proteins and a higher β-sheet structure were observed. However, these structural differences were evened out after the pasta was cooked, resulting in identical protein digestibility in LT and VHT pasta. Even after VHT drying, F-pasta had the best amino acid profile with the highest protein digestibility, proof of its nutritional interest.

  10. Building a better fragment library for de novo protein structure prediction.

    Directory of Open Access Journals (Sweden)

    Saulo H P de Oliveira

    Full Text Available Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10. We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. "Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources".

  11. Building a Better Fragment Library for De Novo Protein Structure Prediction

    Science.gov (United States)

    de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

    2015-01-01

    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595

  12. Connecting Protein Structure to Intermolecular Interactions: A Computer Modeling Laboratory

    Science.gov (United States)

    Abualia, Mohammed; Schroeder, Lianne; Garcia, Megan; Daubenmire, Patrick L.; Wink, Donald J.; Clark, Ginevra A.

    2016-01-01

    An understanding of protein folding relies on a solid foundation of a number of critical chemical concepts, such as molecular structure, intra-/intermolecular interactions, and relating structure to function. Recent reports show that students struggle on all levels to achieve these understandings and use them in meaningful ways. Further, several…

  13. Structure-based barcoding of proteins.

    Science.gov (United States)

    Metri, Rahul; Jerath, Gaurav; Kailas, Govind; Gacche, Nitin; Pal, Adityabarna; Ramakrishnan, Vibin

    2014-01-01

    A reduced representation in the format of a barcode has been developed to provide an overview of the topological nature of a given protein structure from 3D coordinate file. The molecular structure of a protein coordinate file from Protein Data Bank is first expressed in terms of an alpha-numero code and further converted to a barcode image. The barcode representation can be used to compare and contrast different proteins based on their structure. The utility of this method has been exemplified by comparing structural barcodes of proteins that belong to same fold family, and across different folds. In addition to this, we have attempted to provide an illustration to (i) the structural changes often seen in a given protein molecule upon interaction with ligands and (ii) Modifications in overall topology of a given protein during evolution. The program is fully downloadable from the website http://www.iitg.ac.in/probar/. © 2013 The Protein Society.

  14. Taking advantage of local structure descriptors to analyze interresidue contacts in protein structures and protein complexes.

    Science.gov (United States)

    Martin, Juliette; Regad, Leslie; Etchebest, Catherine; Camproux, Anne-Claude

    2008-11-15

    Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Voronoï tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions.

  15. MMM: A toolbox for integrative structure modeling.

    Science.gov (United States)

    Jeschke, Gunnar

    2018-01-01

    Structural characterization of proteins and their complexes may require integration of restraints from various experimental techniques. MMM (Multiscale Modeling of Macromolecules) is a Matlab-based open-source modeling toolbox for this purpose with a particular emphasis on distance distribution restraints obtained from electron paramagnetic resonance experiments on spin-labelled proteins and nucleic acids and their combination with atomistic structures of domains or whole protomers, small-angle scattering data, secondary structure information, homology information, and elastic network models. MMM does not only integrate various types of restraints, but also various existing modeling tools by providing a common graphical user interface to them. The types of restraints that can support such modeling and the available model types are illustrated by recent application examples. © 2017 The Protein Society.

  16. Structural Elements Regulating AAA+ Protein Quality Control Machines.

    Science.gov (United States)

    Chang, Chiung-Wen; Lee, Sukyeong; Tsai, Francis T F

    2017-01-01

    Members of the ATPases Associated with various cellular Activities (AAA+) superfamily participate in essential and diverse cellular pathways in all kingdoms of life by harnessing the energy of ATP binding and hydrolysis to drive their biological functions. Although most AAA+ proteins share a ring-shaped architecture, AAA+ proteins have evolved distinct structural elements that are fine-tuned to their specific functions. A central question in the field is how ATP binding and hydrolysis are coupled to substrate translocation through the central channel of ring-forming AAA+ proteins. In this mini-review, we will discuss structural elements present in AAA+ proteins involved in protein quality control, drawing similarities to their known role in substrate interaction by AAA+ proteins involved in DNA translocation. Elements to be discussed include the pore loop-1, the Inter-Subunit Signaling (ISS) motif, and the Pre-Sensor I insert (PS-I) motif. Lastly, we will summarize our current understanding on the inter-relationship of those structural elements and propose a model how ATP binding and hydrolysis might be coupled to polypeptide translocation in protein quality control machines.

  17. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng

    2016-06-15

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

  18. I-TASSER server for protein 3D structure prediction

    Directory of Open Access Journals (Sweden)

    Zhang Yang

    2008-01-01

    Full Text Available Abstract Background Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions. Results An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1] of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 Å for RMSD. Conclusion The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available

  19. Blind Test of Physics-Based Prediction of Protein Structures

    Science.gov (United States)

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  20. PROGRAM SYSTEM AND INFORMATION METADATA BANK OF TERTIARY PROTEIN STRUCTURES

    Directory of Open Access Journals (Sweden)

    T. A. Nikitin

    2013-01-01

    Full Text Available The article deals with the architecture of metadata storage model for check results of three-dimensional protein structures. Concept database model was built. The service and procedure of database update as well as data transformation algorithms for protein structures and their quality were presented. Most important information about entries and their submission forms to store, access, and delivery to users were highlighted. Software suite was developed for the implementation of functional tasks using Java programming language in the NetBeans v.7.0 environment and JQL to query and interact with the database JavaDB. The service was tested and results have shown system effectiveness while protein structures filtration.

  1. Knowledge base and neural network approach for protein secondary structure prediction.

    Science.gov (United States)

    Patel, Maulika S; Mazumdar, Himanshu S

    2014-11-21

    Protein structure prediction is of great relevance given the abundant genomic and proteomic data generated by the genome sequencing projects. Protein secondary structure prediction is addressed as a sub task in determining the protein tertiary structure and function. In this paper, a novel algorithm, KB-PROSSP-NN, which is a combination of knowledge base and modeling of the exceptions in the knowledge base using neural networks for protein secondary structure prediction (PSSP), is proposed. The knowledge base is derived from a proteomic sequence-structure database and consists of the statistics of association between the 5-residue words and corresponding secondary structure. The predicted results obtained using knowledge base are refined with a Backpropogation neural network algorithm. Neural net models the exceptions of the knowledge base. The Q3 accuracy of 90% and 82% is achieved on the RS126 and CB396 test sets respectively which suggest improvement over existing state of art methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. EDM-DEDM and protein crystal structure solution.

    Science.gov (United States)

    Caliandro, Rocco; Carrozzini, Benedetta; Cascarano, Giovanni Luca; Giacovazzo, Carmelo; Mazzone, Anna Maria; Siliqi, Dritan

    2009-05-01

    Electron-density modification (EDM) procedures are the classical tool for driving model phases closer to those of the target structure. They are often combined with automated model-building programs to provide a correct protein model. The task is not always performed, mostly because of the large initial phase error. A recently proposed procedure combined EDM with DEDM (difference electron-density modification); the method was applied to the refinement of phases obtained by molecular replacement, ab initio or SAD phasing [Caliandro, Carrozzini, Cascarano, Giacovazzo, Mazzone & Siliqi (2009), Acta Cryst. D65, 249-256] and was more effective in improving phases than EDM alone. In this paper, a novel fully automated protocol for protein structure refinement based on the iterative application of automated model-building programs combined with the additional power derived from the EDM-DEDM algorithm is presented. The cyclic procedure was successfully tested on challenging cases for which all other approaches had failed.

  3. Clustering structures of large proteins using multifractal analyses based on a 6-letter model and hydrophobicity scale of amino acids

    International Nuclear Information System (INIS)

    Yang Jianyi; Yu Zuguo; Anh, Vo

    2009-01-01

    The Schneider and Wrede hydrophobicity scale of amino acids and the 6-letter model of protein are proposed to study the relationship between the primary structure and the secondary structural classification of proteins. Two kinds of multifractal analyses are performed on the two measures obtained from these two kinds of data on large proteins. Nine parameters from the multifractal analyses are considered to construct the parameter spaces. Each protein is represented by one point in these spaces. A procedure is proposed to separate large proteins in the α, β, α + β and α/β structural classes in these parameter spaces. Fisher's linear discriminant algorithm is used to assess our clustering accuracy on the 49 selected large proteins. Numerical results indicate that the discriminant accuracies are satisfactory. In particular, they reach 100.00% and 84.21% in separating the α proteins from the {β, α + β, α/β} proteins in a parameter space; 92.86% and 86.96% in separating the β proteins from the {α + β, α/β} proteins in another parameter space; 91.67% and 83.33% in separating the α/β proteins from the α + β proteins in the last parameter space.

  4. Biophysical and structural considerations for protein sequence evolution

    Directory of Open Access Journals (Sweden)

    Grahnen Johan A

    2011-12-01

    Full Text Available Abstract Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model.

  5. SA-Search: a web tool for protein structure mining based on a Structural Alphabet

    OpenAIRE

    Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre

    2004-01-01

    SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of f...

  6. Critical Features of Fragment Libraries for Protein Structure Prediction.

    Science.gov (United States)

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  7. Combining NMR ensembles and molecular dynamics simulations provides more realistic models of protein structures in solution and leads to better chemical shift prediction

    International Nuclear Information System (INIS)

    Lehtivarjo, Juuso; Tuppurainen, Kari; Hassinen, Tommi; Laatikainen, Reino; Peräkylä, Mikael

    2012-01-01

    While chemical shifts are invaluable for obtaining structural information from proteins, they also offer one of the rare ways to obtain information about protein dynamics. A necessary tool in transforming chemical shifts into structural and dynamic information is chemical shift prediction. In our previous work we developed a method for 4D prediction of protein 1 H chemical shifts in which molecular motions, the 4th dimension, were modeled using molecular dynamics (MD) simulations. Although the approach clearly improved the prediction, the X-ray structures and single NMR conformers used in the model cannot be considered fully realistic models of protein in solution. In this work, NMR ensembles (NMRE) were used to expand the conformational space of proteins (e.g. side chains, flexible loops, termini), followed by MD simulations for each conformer to map the local fluctuations. Compared with the non-dynamic model, the NMRE+MD model gave 6–17% lower root-mean-square (RMS) errors for different backbone nuclei. The improved prediction indicates that NMR ensembles with MD simulations can be used to obtain a more realistic picture of protein structures in solutions and moreover underlines the importance of short and long time-scale dynamics for the prediction. The RMS errors of the NMRE+MD model were 0.24, 0.43, 0.98, 1.03, 1.16 and 2.39 ppm for 1 Hα, 1 HN, 13 Cα, 13 Cβ, 13 CO and backbone 15 N chemical shifts, respectively. The model is implemented in the prediction program 4DSPOT, available at http://www.uef.fi/4dspothttp://www.uef.fi/4dspot.

  8. Combining NMR ensembles and molecular dynamics simulations provides more realistic models of protein structures in solution and leads to better chemical shift prediction

    Energy Technology Data Exchange (ETDEWEB)

    Lehtivarjo, Juuso, E-mail: juuso.lehtivarjo@uef.fi; Tuppurainen, Kari; Hassinen, Tommi; Laatikainen, Reino [University of Eastern Finland, School of Pharmacy (Finland); Peraekylae, Mikael [University of Eastern Finland, Institute of Biomedicine (Finland)

    2012-03-15

    While chemical shifts are invaluable for obtaining structural information from proteins, they also offer one of the rare ways to obtain information about protein dynamics. A necessary tool in transforming chemical shifts into structural and dynamic information is chemical shift prediction. In our previous work we developed a method for 4D prediction of protein {sup 1}H chemical shifts in which molecular motions, the 4th dimension, were modeled using molecular dynamics (MD) simulations. Although the approach clearly improved the prediction, the X-ray structures and single NMR conformers used in the model cannot be considered fully realistic models of protein in solution. In this work, NMR ensembles (NMRE) were used to expand the conformational space of proteins (e.g. side chains, flexible loops, termini), followed by MD simulations for each conformer to map the local fluctuations. Compared with the non-dynamic model, the NMRE+MD model gave 6-17% lower root-mean-square (RMS) errors for different backbone nuclei. The improved prediction indicates that NMR ensembles with MD simulations can be used to obtain a more realistic picture of protein structures in solutions and moreover underlines the importance of short and long time-scale dynamics for the prediction. The RMS errors of the NMRE+MD model were 0.24, 0.43, 0.98, 1.03, 1.16 and 2.39 ppm for {sup 1}H{alpha}, {sup 1}HN, {sup 13}C{alpha}, {sup 13}C{beta}, {sup 13}CO and backbone {sup 15}N chemical shifts, respectively. The model is implemented in the prediction program 4DSPOT, available at http://www.uef.fi/4dspothttp://www.uef.fi/4dspot.

  9. Roles of beta-turns in protein folding: from peptide models to protein engineering.

    Science.gov (United States)

    Marcelino, Anna Marie C; Gierasch, Lila M

    2008-05-01

    Reverse turns are a major class of protein secondary structure; they represent sites of chain reversal and thus sites where the globular character of a protein is created. It has been speculated for many years that turns may nucleate the formation of structure in protein folding, as their propensity to occur will favor the approximation of their flanking regions and their general tendency to be hydrophilic will favor their disposition at the solvent-accessible surface. Reverse turns are local features, and it is therefore not surprising that their structural properties have been extensively studied using peptide models. In this article, we review research on peptide models of turns to test the hypothesis that the propensities of turns to form in short peptides will relate to the roles of corresponding sequences in protein folding. Turns with significant stability as isolated entities should actively promote the folding of a protein, and by contrast, turn sequences that merely allow the chain to adopt conformations required for chain reversal are predicted to be passive in the folding mechanism. We discuss results of protein engineering studies of the roles of turn residues in folding mechanisms. Factors that correlate with the importance of turns in folding indeed include their intrinsic stability, as well as their topological context and their participation in hydrophobic networks within the protein's structure.

  10. An Integrated Framework Advancing Membrane Protein Modeling and Design.

    Directory of Open Access Journals (Sweden)

    Rebecca F Alford

    2015-09-01

    Full Text Available Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1 prediction of free energy changes upon mutation; (2 high-resolution structural refinement; (3 protein-protein docking; and (4 assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design.

  11. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    Science.gov (United States)

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-06-20

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

  12. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  13. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

    Science.gov (United States)

    Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

    2018-03-10

    Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.

  14. Protein Loop Structure Prediction Using Conformational Space Annealing.

    Science.gov (United States)

    Heo, Seungryong; Lee, Juyong; Joo, Keehyoung; Shin, Hang-Cheol; Lee, Jooyoung

    2017-05-22

    We have developed a protein loop structure prediction method by combining a new energy function, which we call E PLM (energy for protein loop modeling), with the conformational space annealing (CSA) global optimization algorithm. The energy function includes stereochemistry, dynamic fragment assembly, distance-scaled finite ideal gas reference (DFIRE), and generalized orientation- and distance-dependent terms. For the conformational search of loop structures, we used the CSA algorithm, which has been quite successful in dealing with various hard global optimization problems. We assessed the performance of E PLM with two widely used loop-decoy sets, Jacobson and RAPPER, and compared the results against the DFIRE potential. The accuracy of model selection from a pool of loop decoys as well as de novo loop modeling starting from randomly generated structures was examined separately. For the selection of a nativelike structure from a decoy set, E PLM was more accurate than DFIRE in the case of the Jacobson set and had similar accuracy in the case of the RAPPER set. In terms of sampling more nativelike loop structures, E PLM outperformed E DFIRE for both decoy sets. This new approach equipped with E PLM and CSA can serve as the state-of-the-art de novo loop modeling method.

  15. PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    Directory of Open Access Journals (Sweden)

    Aboul-Magd Mohammed O

    2009-07-01

    Full Text Available Abstract Background Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures from primary sequence data which makes use of Parallel Cascade Identification (PCI, a powerful technique from the field of nonlinear system identification. Results Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at http://bioinf.sce.carleton.ca/PCISS. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input

  16. 3DProIN: Protein-Protein Interaction Networks and Structure Visualization.

    Science.gov (United States)

    Li, Hui; Liu, Chunmei

    2014-06-14

    3DProIN is a computational tool to visualize protein-protein interaction networks in both two dimensional (2D) and three dimensional (3D) view. It models protein-protein interactions in a graph and explores the biologically relevant features of the tertiary structures of each protein in the network. Properties such as color, shape and name of each node (protein) of the network can be edited in either 2D or 3D views. 3DProIN is implemented using 3D Java and C programming languages. The internet crawl technique is also used to parse dynamically grasped protein interactions from protein data bank (PDB). It is a java applet component that is embedded in the web page and it can be used on different platforms including Linux, Mac and Window using web browsers such as Firefox, Internet Explorer, Chrome and Safari. It also was converted into a mac app and submitted to the App store as a free app. Mac users can also download the app from our website. 3DProIN is available for academic research at http://bicompute.appspot.com.

  17. A robust algorithm for optimizing protein structures with NMR chemical shifts

    Energy Technology Data Exchange (ETDEWEB)

    Berjanskii, Mark; Arndt, David; Liang, Yongjie; Wishart, David S., E-mail: david.wishart@ualberta.ca [University of Alberta, Department of Computing Science (Canada)

    2015-11-15

    Over the past decade, a number of methods have been developed to determine the approximate structure of proteins using minimal NMR experimental information such as chemical shifts alone, sparse NOEs alone or a combination of comparative modeling data and chemical shifts. However, there have been relatively few methods that allow these approximate models to be substantively refined or improved using the available NMR chemical shift data. Here, we present a novel method, called Chemical Shift driven Genetic Algorithm for biased Molecular Dynamics (CS-GAMDy), for the robust optimization of protein structures using experimental NMR chemical shifts. The method incorporates knowledge-based scoring functions and structural information derived from NMR chemical shifts via a unique combination of multi-objective MD biasing, a genetic algorithm, and the widely used XPLOR molecular modelling language. Using this approach, we demonstrate that CS-GAMDy is able to refine and/or fold models that are as much as 10 Å (RMSD) away from the correct structure using only NMR chemical shift data. CS-GAMDy is also able to refine of a wide range of approximate or mildly erroneous protein structures to more closely match the known/correct structure and the known/correct chemical shifts. We believe CS-GAMDy will allow protein models generated by sparse restraint or chemical-shift-only methods to achieve sufficiently high quality to be considered fully refined and “PDB worthy”. The CS-GAMDy algorithm is explained in detail and its performance is compared over a range of refinement scenarios with several commonly used protein structure refinement protocols. The program has been designed to be easily installed and easily used and is available at http://www.gamdy.ca http://www.gamdy.ca.

  18. GPCR-I-TASSER: A Hybrid Approach to G Protein-Coupled Receptor Structure Modeling and the Application to the Human Genome.

    Science.gov (United States)

    Zhang, Jian; Yang, Jianyi; Jang, Richard; Zhang, Yang

    2015-08-04

    Experimental structure determination remains difficult for G protein-coupled receptors (GPCRs). We propose a new hybrid protocol to construct GPCR structure models that integrates experimental mutagenesis data with ab initio transmembrane (TM) helix assembly simulations. The method was tested on 24 known GPCRs where the ab initio TM-helix assembly procedure constructed the correct fold for 20 cases. When combined with weak homology and sparse mutagenesis restraints, the method generated correct folds for all the tested cases with an average Cα root-mean-square deviation 2.4 Å in the TM regions. The new hybrid protocol was applied to model all 1,026 GPCRs in the human genome, where 923 have a high confidence score and are expected to have correct folds; these contain many pharmaceutically important families with no previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin, and Neuropeptide Y receptors. The results demonstrate new progress on genome-wide structure modeling of TM proteins. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Preliminary structural characterization of human SOUL, a haem-binding protein

    International Nuclear Information System (INIS)

    Freire, Filipe; Romão, Maria João; Macedo, Anjos L.; Aveiro, Susana S.; Goodfellow, Brian J.; Carvalho, Ana Luísa

    2009-01-01

    This manuscript describes the overexpression, purification and crystallization of human SOUL protein (hSOUL). hSOUL is a 23 kDa haem-binding protein that was first identified as the PP23 protein isolated from human full-term placenta. Human SOUL (hSOUL) is a 23 kDa haem-binding protein that was first identified as the PP 23 protein isolated from human full-term placentas. Here, the overexpression, purification and crystallization of hSOUL are reported. The crystals belonged to space group P6 4 22, with unit-cell parameters a = b = 145, c = 60 Å and one protein molecule in the asymmetric unit. X-ray diffraction data were collected to 3.5 Å resolution at the ESRF. A preliminary model of the three-dimensional structure of hSOUL was obtained by molecular replacement using the structures of murine p22HBP, obtained by solution NMR, as search models

  20. Integrative structure modeling with the Integrative Modeling Platform.

    Science.gov (United States)

    Webb, Benjamin; Viswanath, Shruthi; Bonomi, Massimiliano; Pellarin, Riccardo; Greenberg, Charles H; Saltzberg, Daniel; Sali, Andrej

    2018-01-01

    Building models of a biological system that are consistent with the myriad data available is one of the key challenges in biology. Modeling the structure and dynamics of macromolecular assemblies, for example, can give insights into how biological systems work, evolved, might be controlled, and even designed. Integrative structure modeling casts the building of structural models as a computational optimization problem, for which information about the assembly is encoded into a scoring function that evaluates candidate models. Here, we describe our open source software suite for integrative structure modeling, Integrative Modeling Platform (https://integrativemodeling.org), and demonstrate its use. © 2017 The Protein Society.

  1. Molecular modeling of protein materials: case study of elastin

    International Nuclear Information System (INIS)

    Tarakanova, Anna; Buehler, Markus J

    2013-01-01

    Molecular modeling of protein materials is a quickly growing area of research that has produced numerous contributions in fields ranging from structural engineering to medicine and biology. We review here the history and methods commonly employed in molecular modeling of protein materials, emphasizing the advantages for using modeling as a complement to experimental work. We then consider a case study of the protein elastin, a critically important ‘mechanical protein’ to exemplify the approach in an area where molecular modeling has made a significant impact. We outline the progression of computational modeling studies that have considerably enhanced our understanding of this important protein which endows elasticity and recoil to the tissues it is found in, including the skin, lungs, arteries and the heart. A vast collection of literature has been directed at studying the structure and function of this protein for over half a century, the first molecular dynamics study of elastin being reported in the 1980s. We review the pivotal computational works that have considerably enhanced our fundamental understanding of elastin's atomistic structure and its extraordinary qualities—focusing on two in particular: elastin's superb elasticity and the inverse temperature transition—the remarkable ability of elastin to take on a more structured conformation at higher temperatures, suggesting its effectiveness as a biomolecular switch. Our hope is to showcase these methods as both complementary and enriching to experimental approaches that have thus far dominated the study of most protein-based materials. (topical review)

  2. Ultrafast protein structure-based virtual screening with Panther

    Science.gov (United States)

    Niinivehmas, Sanna P.; Salokas, Kari; Lätti, Sakari; Raunio, Hannu; Pentikäinen, Olli T.

    2015-10-01

    Molecular docking is by far the most common method used in protein structure-based virtual screening. This paper presents Panther, a novel ultrafast multipurpose docking tool. In Panther, a simple shape-electrostatic model of the ligand-binding area of the protein is created by utilizing the protein crystal structure. The features of the possible ligands are then compared to the model by using a similarity search algorithm. On average, one ligand can be processed in a few minutes by using classical docking methods, whereas using Panther processing takes Panther protocol can be used in several applications, such as speeding up the early phases of drug discovery projects, reducing the number of failures in the clinical phase of the drug development process, and estimating the environmental toxicity of chemicals. Panther-code is available in our web pages (http://www.jyu.fi/panther) free of charge after registration.

  3. BLAST-based structural annotation of protein residues using Protein Data Bank.

    Science.gov (United States)

    Singh, Harinder; Raghava, Gajendra P S

    2016-01-25

    In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PDB) but it is difficult for a biologist to annotate protein residues from BLAST search. A web-server StarPDB has been developed for structural annotation of a protein based on its similarity with known protein structures. It uses standard BLAST software for performing similarity search of a query protein against protein structures in PDB. This server integrates wide range modules for assigning different types of annotation that includes, Secondary-structure, Accessible surface area, Tight-turns, DNA-RNA and Ligand modules. Secondary structure module allows users to predict regular secondary structure states to each residue in a protein. Accessible surface area predict the exposed or buried residues in a protein. Tight-turns module is designed to predict tight turns like beta-turns in a protein. DNA-RNA module developed for predicting DNA and RNA interacting residues in a protein. Similarly, Ligand module of server allows one to predicted ligands, metal and nucleotides ligand interacting residues in a protein. In summary, this manuscript presents a web server for comprehensive annotation of a protein based on similarity search. It integrates number of visualization tools that facilitate users to understand structure and function of protein residues. This web server is available freely for scientific community from URL http://crdd.osdd.net/raghava/starpdb .

  4. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    Science.gov (United States)

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  5. Critical assessment of methods of protein structure prediction (CASP) - round x

    KAUST Repository

    Moult, John; Fidelis, Krzysztof; Kryshtafovych, Andriy; Schwede, Torsten; Tramontano, Anna

    2013-01-01

    This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.

  6. Critical assessment of methods of protein structure prediction (CASP) - round x

    KAUST Repository

    Moult, John

    2013-12-17

    This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.

  7. Single-particle electron microscopy in the study of membrane protein structure.

    Science.gov (United States)

    De Zorzi, Rita; Mi, Wei; Liao, Maofu; Walz, Thomas

    2016-02-01

    Single-particle electron microscopy (EM) provides the great advantage that protein structure can be studied without the need to grow crystals. However, due to technical limitations, this approach played only a minor role in the study of membrane protein structure. This situation has recently changed dramatically with the introduction of direct electron detection device cameras, which allow images of unprecedented quality to be recorded, also making software algorithms, such as three-dimensional classification and structure refinement, much more powerful. The enhanced potential of single-particle EM was impressively demonstrated by delivering the first long-sought atomic model of a member of the biomedically important transient receptor potential channel family. Structures of several more membrane proteins followed in short order. This review recounts the history of single-particle EM in the study of membrane proteins, describes the technical advances that now allow this approach to generate atomic models of membrane proteins and provides a brief overview of some of the membrane protein structures that have been studied by single-particle EM to date. © The Author 2015. Published by Oxford University Press on behalf of The Japanese Society of Microscopy. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

    Science.gov (United States)

    Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

    2011-08-01

    Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.

  9. Modeling disordered regions in proteins using Rosetta.

    Directory of Open Access Journals (Sweden)

    Ray Yu-Ruei Wang

    Full Text Available Protein structure prediction methods such as Rosetta search for the lowest energy conformation of the polypeptide chain. However, the experimentally observed native state is at a minimum of the free energy, rather than the energy. The neglect of the missing configurational entropy contribution to the free energy can be partially justified by the assumption that the entropies of alternative folded states, while very much less than unfolded states, are not too different from one another, and hence can be to a first approximation neglected when searching for the lowest free energy state. The shortcomings of current structure prediction methods may be due in part to the breakdown of this assumption. Particularly problematic are proteins with significant disordered regions which do not populate single low energy conformations even in the native state. We describe two approaches within the Rosetta structure modeling methodology for treating such regions. The first does not require advance knowledge of the regions likely to be disordered; instead these are identified by minimizing a simple free energy function used previously to model protein folding landscapes and transition states. In this model, residues can be either completely ordered or completely disordered; they are considered disordered if the gain in entropy outweighs the loss of favorable energetic interactions with the rest of the protein chain. The second approach requires identification in advance of the disordered regions either from sequence alone using for example the DISOPRED server or from experimental data such as NMR chemical shifts. During Rosetta structure prediction calculations the disordered regions make only unfavorable repulsive contributions to the total energy. We find that the second approach has greater practical utility and illustrate this with examples from de novo structure prediction, NMR structure calculation, and comparative modeling.

  10. Automated de novo phasing and model building of coiled-coil proteins.

    Science.gov (United States)

    Rämisch, Sebastian; Lizatović, Robert; André, Ingemar

    2015-03-01

    Models generated by de novo structure prediction can be very useful starting points for molecular replacement for systems where suitable structural homologues cannot be readily identified. Protein-protein complexes and de novo-designed proteins are examples of systems that can be challenging to phase. In this study, the potential of de novo models of protein complexes for use as starting points for molecular replacement is investigated. The approach is demonstrated using homomeric coiled-coil proteins, which are excellent model systems for oligomeric systems. Despite the stereotypical fold of coiled coils, initial phase estimation can be difficult and many structures have to be solved with experimental phasing. A method was developed for automatic structure determination of homomeric coiled coils from X-ray diffraction data. In a benchmark set of 24 coiled coils, ranging from dimers to pentamers with resolutions down to 2.5 Å, 22 systems were automatically solved, 11 of which had previously been solved by experimental phasing. The generated models contained 71-103% of the residues present in the deposited structures, had the correct sequence and had free R values that deviated on average by 0.01 from those of the respective reference structures. The electron-density maps were of sufficient quality that only minor manual editing was necessary to produce final structures. The method, named CCsolve, combines methods for de novo structure prediction, initial phase estimation and automated model building into one pipeline. CCsolve is robust against errors in the initial models and can readily be modified to make use of alternative crystallographic software. The results demonstrate the feasibility of de novo phasing of protein-protein complexes, an approach that could also be employed for other small systems beyond coiled coils.

  11. Effects of lysine residues on structural characteristics and stability of tau proteins

    International Nuclear Information System (INIS)

    Lee, Myeongsang; Baek, Inchul; Choi, Hyunsung; Kim, Jae In; Na, Sungsoo

    2015-01-01

    Pathological amyloid proteins have been implicated in neuro-degenerative diseases, specifically Alzheimer's, Parkinson's, Lewy-body diseases and prion related diseases. In prion related diseases, functional tau proteins can be transformed into pathological agents by environmental factors, including oxidative stress, inflammation, Aβ-mediated toxicity and covalent modification. These pathological agents are stable under physiological conditions and are not easily degraded. This un-degradable characteristic of tau proteins enables their utilization as functional materials to capturing the carbon dioxides. For the proper utilization of amyloid proteins as functional materials efficiently, a basic study regarding their structural characteristic is necessary. Here, we investigated the basic tau protein structure of wild-type (WT) and tau proteins with lysine residues mutation at glutamic residue (Q2K) on tau protein at atomistic scale. We also reported the size effect of both the WT and Q2K structures, which allowed us to identify the stability of those amyloid structures. - Highlights: • Lysine mutation effect alters the structure conformation and characteristic of tau. • Over the 15 layers both WT and Q2K models, both tau proteins undergo fractions. • Lysine mutation causes the increment of non-bonded energy and solvent accessible surface area. • Structural instability of Q2K model was proved by the number of hydrogen bonds analysis.

  12. Effects of lysine residues on structural characteristics and stability of tau proteins

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Myeongsang; Baek, Inchul; Choi, Hyunsung; Kim, Jae In; Na, Sungsoo, E-mail: nass@korea.ac.kr

    2015-10-23

    Pathological amyloid proteins have been implicated in neuro-degenerative diseases, specifically Alzheimer's, Parkinson's, Lewy-body diseases and prion related diseases. In prion related diseases, functional tau proteins can be transformed into pathological agents by environmental factors, including oxidative stress, inflammation, Aβ-mediated toxicity and covalent modification. These pathological agents are stable under physiological conditions and are not easily degraded. This un-degradable characteristic of tau proteins enables their utilization as functional materials to capturing the carbon dioxides. For the proper utilization of amyloid proteins as functional materials efficiently, a basic study regarding their structural characteristic is necessary. Here, we investigated the basic tau protein structure of wild-type (WT) and tau proteins with lysine residues mutation at glutamic residue (Q2K) on tau protein at atomistic scale. We also reported the size effect of both the WT and Q2K structures, which allowed us to identify the stability of those amyloid structures. - Highlights: • Lysine mutation effect alters the structure conformation and characteristic of tau. • Over the 15 layers both WT and Q2K models, both tau proteins undergo fractions. • Lysine mutation causes the increment of non-bonded energy and solvent accessible surface area. • Structural instability of Q2K model was proved by the number of hydrogen bonds analysis.

  13. DockQ: A Quality Measure for Protein-Protein Docking Models.

    Directory of Open Access Journals (Sweden)

    Sankar Basu

    Full Text Available The state-of-the-art to assess the structural quality of docking models is currently based on three related yet independent quality measures: Fnat, LRMS, and iRMS as proposed and standardized by CAPRI. These quality measures quantify different aspects of the quality of a particular docking model and need to be viewed together to reveal the true quality, e.g. a model with relatively poor LRMS (>10Å might still qualify as 'acceptable' with a descent Fnat (>0.50 and iRMS (<3.0Å. This is also the reason why the so called CAPRI criteria for assessing the quality of docking models is defined by applying various ad-hoc cutoffs on these measures to classify a docking model into the four classes: Incorrect, Acceptable, Medium, or High quality. This classification has been useful in CAPRI, but since models are grouped in only four bins it is also rather limiting, making it difficult to rank models, correlate with scoring functions or use it as target function in machine learning algorithms. Here, we present DockQ, a continuous protein-protein docking model quality measure derived by combining Fnat, LRMS, and iRMS to a single score in the range [0, 1] that can be used to assess the quality of protein docking models. By using DockQ on CAPRI models it is possible to almost completely reproduce the original CAPRI classification into Incorrect, Acceptable, Medium and High quality. An average PPV of 94% at 90% Recall demonstrating that there is no need to apply predefined ad-hoc cutoffs to classify docking models. Since DockQ recapitulates the CAPRI classification almost perfectly, it can be viewed as a higher resolution version of the CAPRI classification, making it possible to estimate model quality in a more quantitative way using Z-scores or sum of top ranked models, which has been so valuable for the CASP community. The possibility to directly correlate a quality measure to a scoring function has been crucial for the development of scoring functions for

  14. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody

    2016-03-23

    Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. Methods To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. Results The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites. Conclusions Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel resistance

  15. Protein Folding: Search for Basic Physical Models

    Directory of Open Access Journals (Sweden)

    Ivan Y. Torshin

    2003-01-01

    Full Text Available How a unique three-dimensional structure is rapidly formed from the linear sequence of a polypeptide is one of the important questions in contemporary science. Apart from biological context of in vivo protein folding (which has been studied only for a few proteins, the roles of the fundamental physical forces in the in vitro folding remain largely unstudied. Despite a degree of success in using descriptions based on statistical and/or thermodynamic approaches, few of the current models explicitly include more basic physical forces (such as electrostatics and Van Der Waals forces. Moreover, the present-day models rarely take into account that the protein folding is, essentially, a rapid process that produces a highly specific architecture. This review considers several physical models that may provide more direct links between sequence and tertiary structure in terms of the physical forces. In particular, elaboration of such simple models is likely to produce extremely effective computational techniques with value for modern genomics.

  16. Validation of protein models by a neural network approach

    Directory of Open Access Journals (Sweden)

    Fantucci Piercarlo

    2008-01-01

    Full Text Available Abstract Background The development and improvement of reliable computational methods designed to evaluate the quality of protein models is relevant in the context of protein structure refinement, which has been recently identified as one of the bottlenecks limiting the quality and usefulness of protein structure prediction. Results In this contribution, we present a computational method (Artificial Intelligence Decoys Evaluator: AIDE which is able to consistently discriminate between correct and incorrect protein models. In particular, the method is based on neural networks that use as input 15 structural parameters, which include energy, solvent accessible surface, hydrophobic contacts and secondary structure content. The results obtained with AIDE on a set of decoy structures were evaluated using statistical indicators such as Pearson correlation coefficients, Znat, fraction enrichment, as well as ROC plots. It turned out that AIDE performances are comparable and often complementary to available state-of-the-art learning-based methods. Conclusion In light of the results obtained with AIDE, as well as its comparison with available learning-based methods, it can be concluded that AIDE can be successfully used to evaluate the quality of protein structures. The use of AIDE in combination with other evaluation tools is expected to further enhance protein refinement efforts.

  17. Probing the Energetics of Dynactin Filament Assembly and the Binding of Cargo Adaptor Proteins Using Molecular Dynamics Simulation and Electrostatics-Based Structural Modeling.

    Science.gov (United States)

    Zheng, Wenjun

    2017-01-10

    Dynactin, a large multiprotein complex, binds with the cytoplasmic dynein-1 motor and various adaptor proteins to allow recruitment and transportation of cellular cargoes toward the minus end of microtubules. The structure of the dynactin complex is built around an actin-like minifilament with a defined length, which has been visualized in a high-resolution structure of the dynactin filament determined by cryo-electron microscopy (cryo-EM). To understand the energetic basis of dynactin filament assembly, we used molecular dynamics simulation to probe the intersubunit interactions among the actin-like proteins, various capping proteins, and four extended regions of the dynactin shoulder. Our simulations revealed stronger intersubunit interactions at the barbed and pointed ends of the filament and involving the extended regions (compared with the interactions within the filament), which may energetically drive filament termination by the capping proteins and recruitment of the actin-like proteins by the extended regions, two key features of the dynactin filament assembly process. Next, we modeled the unknown binding configuration among dynactin, dynein tails, and a number of coiled-coil adaptor proteins (including several Bicaudal-D and related proteins and three HOOK proteins), and predicted a key set of charged residues involved in their electrostatic interactions. Our modeling is consistent with previous findings of conserved regions, functional sites, and disease mutations in the adaptor proteins and will provide a structural framework for future functional and mutational studies of these adaptor proteins. In sum, this study yielded rich structural and energetic information about dynactin and associated adaptor proteins that cannot be directly obtained from the cryo-EM structures with limited resolutions.

  18. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    Science.gov (United States)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  19. Distance matrix-based approach to protein structure prediction.

    Science.gov (United States)

    Kloczkowski, Andrzej; Jernigan, Robert L; Wu, Zhijun; Song, Guang; Yang, Lei; Kolinski, Andrzej; Pokarowski, Piotr

    2009-03-01

    dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).

  20. Protein homology model refinement by large-scale energy optimization.

    Science.gov (United States)

    Park, Hahnbeom; Ovchinnikov, Sergey; Kim, David E; DiMaio, Frank; Baker, David

    2018-03-20

    Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.

  1. Soliton concepts and protein structure

    Science.gov (United States)

    Krokhotin, Andrei; Niemi, Antti J.; Peng, Xubiao

    2012-03-01

    Structural classification shows that the number of different protein folds is surprisingly small. It also appears that proteins are built in a modular fashion from a relatively small number of components. Here we propose that the modular building blocks are made of the dark soliton solution of a generalized discrete nonlinear Schrödinger equation. We find that practically all protein loops can be obtained simply by scaling the size and by joining together a number of copies of the soliton, one after another. The soliton has only two loop-specific parameters, and we compute their statistical distribution in the Protein Data Bank (PDB). We explicitly construct a collection of 200 sets of parameters, each determining a soliton profile that describes a different short loop. The ensuing profiles cover practically all those proteins in PDB that have a resolution which is better than 2.0 Å, with a precision such that the average root-mean-square distance between the loop and its soliton is less than the experimental B-factor fluctuation distance. We also present two examples that describe how the loop library can be employed both to model and to analyze folded proteins.

  2. A modeling strategy for G-protein coupled receptors

    Directory of Open Access Journals (Sweden)

    Anna Kahler

    2016-03-01

    Full Text Available Cell responses can be triggered via G-protein coupled receptors (GPCRs that interact with small molecules, peptides or proteins and transmit the signal over the membrane via structural changes to activate intracellular pathways. GPCRs are characterized by a rather low sequence similarity and exhibit structural differences even for functionally closely related GPCRs. An accurate structure prediction for GPCRs is therefore not straightforward. We propose a computational approach that relies on the generation of several independent models based on different template structures, which are subsequently refined by molecular dynamics simulations. A comparison of their conformational stability and the agreement with GPCR-typical structural features is then used to select a favorable model. This strategy was applied to predict the structure of the herpesviral chemokine receptor US28 by generating three independent models based on the known structures of the chemokine receptors CXCR1, CXCR4, and CCR5. Model refinement and evaluation suggested that the model based on CCR5 exhibits the most favorable structural properties. In particular, the GPCR-typical structural features, such as a conserved water cluster or conserved non-covalent contacts, are present to a larger extent in the model based on CCR5 compared to the other models. A final model validation based on the recently published US28 crystal structure confirms that the CCR5-based model is the most accurate and exhibits 80.8% correctly modeled residues within the transmembrane helices. The structural agreement between the selected model and the crystal structure suggests that our modeling strategy may also be more generally applicable to other GPCRs of unknown structure.

  3. @TOME-2: a new pipeline for comparative modeling of protein-ligand complexes.

    Science.gov (United States)

    Pons, Jean-Luc; Labesse, Gilles

    2009-07-01

    @TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein-protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein-ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/

  4. Structure Prediction of Outer Membrane Protease Protein of Salmonella typhimurium Using Computational Techniques

    Directory of Open Access Journals (Sweden)

    Rozina Tabassum

    2016-03-01

    Full Text Available Salmonella typhimurium, a facultative gram-negative intracellular pathogen belonging to family Enterobacteriaceae, is the most frequent cause of human gastroenteritis worldwide. PgtE gene product, outer membrane protease emerges important in the intracellular phases of salmonellosis. The pgtE gene product of S. typhimurium was predicted to be capable of proteolyzing T7 RNA polymerase and localize in the outer membrane of these gram negative bacteria. PgtE product of S. enterica and OmpT of E. coli, having high sequence similarity have been revealed to degrade macrophages, causing salmonellosis and other diseases. The three-dimensional structure of the protein was not available through Protein Data Bank (PDB creating lack of structural information about E protein. In our study, by performing Comparative model building, the three dimensional structure of outer membrane protease protein was generated using the backbone of the crystal structure of Pla of Yersinia pestis, retrieved from PDB, with MODELLER (9v8. Quality of the model was assessed by validation tool PROCHECK, web servers like ERRAT and ProSA are used to certify the reliability of the predicted model. This information might offer clues for better understanding of E protein and consequently for developmet of better therapeutic treatment against pathogenic role of this protein in salmonellosis and other diseases.

  5. DockQ: A Quality Measure for Protein-Protein Docking Models

    Science.gov (United States)

    Basu, Sankar

    2016-01-01

    The state-of-the-art to assess the structural quality of docking models is currently based on three related yet independent quality measures: Fnat, LRMS, and iRMS as proposed and standardized by CAPRI. These quality measures quantify different aspects of the quality of a particular docking model and need to be viewed together to reveal the true quality, e.g. a model with relatively poor LRMS (>10Å) might still qualify as 'acceptable' with a descent Fnat (>0.50) and iRMS (iRMS to a single score in the range [0, 1] that can be used to assess the quality of protein docking models. By using DockQ on CAPRI models it is possible to almost completely reproduce the original CAPRI classification into Incorrect, Acceptable, Medium and High quality. An average PPV of 94% at 90% Recall demonstrating that there is no need to apply predefined ad-hoc cutoffs to classify docking models. Since DockQ recapitulates the CAPRI classification almost perfectly, it can be viewed as a higher resolution version of the CAPRI classification, making it possible to estimate model quality in a more quantitative way using Z-scores or sum of top ranked models, which has been so valuable for the CASP community. The possibility to directly correlate a quality measure to a scoring function has been crucial for the development of scoring functions for protein structure prediction, and DockQ should be useful in a similar development in the protein docking field. DockQ is available at http://github.com/bjornwallner/DockQ/ PMID:27560519

  6. Protein Structure Refinement by Optimization

    DEFF Research Database (Denmark)

    Carlsen, Martin

    on whether the three-dimensional structure of a homologous sequence is known. Whether or not a protein model can be used for industrial purposes depends on the quality of the predicted structure. A model can be used to design a drug when the quality is high. The overall goal of this project is to assess...... that correlates maximally to a native-decoy distance. The main contribution of this thesis is methods developed for analyzing the performance of metrically trained knowledge-based potentials and for optimizing their performance while making them less dependent on the decoy set used to define them. We focus...... being at-least a local minimum of the potential. To address how far the current functional form of the potential is from an ideal potential we present two methods for finding the optimal metrically trained potential that simultaneous has a number of native structures as a local minimum. Our results...

  7. Threading structural model of the manganese-stabilizing protein PsbO reveals presence of two possible beta-sandwich domains.

    Science.gov (United States)

    Pazos, F; Heredia, P; Valencia, A; de las Rivas, J

    2001-12-01

    The manganese-stabilizing protein (PsbO) is an essential component of photosystem II (PSII) and is present in all oxyphotosynthetic organisms. PsbO allows correct water splitting and oxygen evolution by stabilizing the reactions driven by the manganese cluster. Despite its important role, its structure and detailed functional mechanism are still unknown. In this article we propose a structural model based on fold recognition and molecular modeling. This model has additional support from a study of the distribution of characteristics of the PsbO sequence family, such as the distribution of conserved, apolar, tree-determinants, and correlated positions. Our threading results consistently showed PsbO as an all-beta (beta) protein, with two homologous beta domains of approximately 120 amino acids linked by a flexible Proline-Glycine-Glycine (PGG) motif. These features are compatible with a general elongated and flexible architecture, in which the two domains form a sandwich-type structure with Greek key topology. The first domain is predicted to include 8 to 9 beta-strands, the second domain 6 to 7 beta-strands. An Ig-like beta-sandwich structure was selected as a template to build the 3-D model. The second domain has, between the strands, long-loops rich in Pro and Gly that are difficult to model. One of these long loops includes a highly conserved region (between P148 and P174) and a short alpha-helix (between E181 and N188)). These regions are characteristic parts of PsbO and show that the second domain is not so similar to the template. Overall, the model was able to account for much of the experimental data reported by several authors, and it would allow the detection of key residues and regions that are proposed in this article as essential for the structure and function of PsbO. Copyright 2001 Wiley-Liss, Inc.

  8. Structure and Sequence Search on Aptamer-Protein Docking

    Science.gov (United States)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  9. Evaluation of variability in high-resolution protein structures by global distance scoring

    Directory of Open Access Journals (Sweden)

    Risa Anzai

    2018-01-01

    Full Text Available Systematic analysis of the statistical and dynamical properties of proteins is critical to understanding cellular events. Extraction of biologically relevant information from a set of high-resolution structures is important because it can provide mechanistic details behind the functional properties of protein families, enabling rational comparison between families. Most of the current structural comparisons are pairwise-based, which hampers the global analysis of increasing contents in the Protein Data Bank. Additionally, pairing of protein structures introduces uncertainty with respect to reproducibility because it frequently accompanies other settings for superimposition. This study introduces intramolecular distance scoring for the global analysis of proteins, for each of which at least several high-resolution structures are available. As a pilot study, we have tested 300 human proteins and showed that the method is comprehensively used to overview advances in each protein and protein family at the atomic level. This method, together with the interpretation of the model calculations, provide new criteria for understanding specific structural variation in a protein, enabling global comparison of the variability in proteins from different species.

  10. Evaluation of variability in high-resolution protein structures by global distance scoring.

    Science.gov (United States)

    Anzai, Risa; Asami, Yoshiki; Inoue, Waka; Ueno, Hina; Yamada, Koya; Okada, Tetsuji

    2018-01-01

    Systematic analysis of the statistical and dynamical properties of proteins is critical to understanding cellular events. Extraction of biologically relevant information from a set of high-resolution structures is important because it can provide mechanistic details behind the functional properties of protein families, enabling rational comparison between families. Most of the current structural comparisons are pairwise-based, which hampers the global analysis of increasing contents in the Protein Data Bank. Additionally, pairing of protein structures introduces uncertainty with respect to reproducibility because it frequently accompanies other settings for superimposition. This study introduces intramolecular distance scoring for the global analysis of proteins, for each of which at least several high-resolution structures are available. As a pilot study, we have tested 300 human proteins and showed that the method is comprehensively used to overview advances in each protein and protein family at the atomic level. This method, together with the interpretation of the model calculations, provide new criteria for understanding specific structural variation in a protein, enabling global comparison of the variability in proteins from different species.

  11. Integrative structural modeling with small angle X-ray scattering profiles

    Directory of Open Access Journals (Sweden)

    Schneidman-Duhovny Dina

    2012-07-01

    Full Text Available Abstract Recent technological advances enabled high-throughput collection of Small Angle X-ray Scattering (SAXS profiles of biological macromolecules. Thus, computational methods for integrating SAXS profiles into structural modeling are needed more than ever. Here, we review specifically the use of SAXS profiles for the structural modeling of proteins, nucleic acids, and their complexes. First, the approaches for computing theoretical SAXS profiles from structures are presented. Second, computational methods for predicting protein structures, dynamics of proteins in solution, and assembly structures are covered. Third, we discuss the use of SAXS profiles in integrative structure modeling approaches that depend simultaneously on several data types.

  12. Structural anatomy of telomere OB proteins.

    Science.gov (United States)

    Horvath, Martin P

    2011-10-01

    Telomere DNA-binding proteins protect the ends of chromosomes in eukaryotes. A subset of these proteins are constructed with one or more OB folds and bind with G+T-rich single-stranded DNA found at the extreme termini. The resulting DNA-OB protein complex interacts with other telomere components to coordinate critical telomere functions of DNA protection and DNA synthesis. While the first crystal and NMR structures readily explained protection of telomere ends, the picture of how single-stranded DNA becomes available to serve as primer and template for synthesis of new telomere DNA is only recently coming into focus. New structures of telomere OB fold proteins alongside insights from genetic and biochemical experiments have made significant contributions towards understanding how protein-binding OB proteins collaborate with DNA-binding OB proteins to recruit telomerase and DNA polymerase for telomere homeostasis. This review surveys telomere OB protein structures alongside highly comparable structures derived from replication protein A (RPA) components, with the goal of providing a molecular context for understanding telomere OB protein evolution and mechanism of action in protection and synthesis of telomere DNA.

  13. Protein Structure and the Sequential Structure of mRNA

    DEFF Research Database (Denmark)

    Brunak, Søren; Engelbrecht, Jacob

    1996-01-01

    entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment, By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets, These signals do not originate from......A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed, We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting...... protein, The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain, A complete search for GenBank nucleotide sequences coding for structural...

  14. Structural features that predict real-value fluctuations of globular proteins.

    Science.gov (United States)

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  15. Structural modelling and comparative analysis of homologous, analogous and specific proteins from Trypanosoma cruzi versus Homo sapiens: putative drug targets for chagas' disease treatment.

    Science.gov (United States)

    Capriles, Priscila V S Z; Guimarães, Ana C R; Otto, Thomas D; Miranda, Antonio B; Dardenne, Laurent E; Degrave, Wim M

    2010-10-29

    Trypanosoma cruzi is the etiological agent of Chagas' disease, an endemic infection that causes thousands of deaths every year in Latin America. Therapeutic options remain inefficient, demanding the search for new drugs and/or new molecular targets. Such efforts can focus on proteins that are specific to the parasite, but analogous enzymes and enzymes with a three-dimensional (3D) structure sufficiently different from the corresponding host proteins may represent equally interesting targets. In order to find these targets we used the workflows MHOLline and AnEnΠ obtaining 3D models from homologous, analogous and specific proteins of Trypanosoma cruzi versus Homo sapiens. We applied genome wide comparative modelling techniques to obtain 3D models for 3,286 predicted proteins of T. cruzi. In combination with comparative genome analysis to Homo sapiens, we were able to identify a subset of 397 enzyme sequences, of which 356 are homologous, 3 analogous and 38 specific to the parasite. In this work, we present a set of 397 enzyme models of T. cruzi that can constitute potential structure-based drug targets to be investigated for the development of new strategies to fight Chagas' disease. The strategies presented here support the concept of structural analysis in conjunction with protein functional analysis as an interesting computational methodology to detect potential targets for structure-based rational drug design. For example, 2,4-dienoyl-CoA reductase (EC 1.3.1.34) and triacylglycerol lipase (EC 3.1.1.3), classified as analogous proteins in relation to H. sapiens enzymes, were identified as new potential molecular targets.

  16. 3Drefine: an interactive web server for efficient protein structure refinement.

    Science.gov (United States)

    Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin

    2016-07-08

    3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. LoopIng: a template-based tool for predicting the structure of protein loops.

    KAUST Repository

    Messih, Mario Abdel

    2015-08-06

    Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function.We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4-10 residues) and significant enhancements for long loops (11-20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop).www.biocomputing.it/loopinganna.tramontano@uniroma1.itSupplementary data are available at Bioinformatics online.

  18. Exploiting conformational ensembles in modeling protein-protein interactions on the proteome scale

    Science.gov (United States)

    Kuzu, Guray; Gursoy, Attila; Nussinov, Ruth; Keskin, Ozlem

    2013-01-01

    Cellular functions are performed through protein-protein interactions; therefore, identification of these interactions is crucial for understanding biological processes. Recent studies suggest that knowledge-based approaches are more useful than ‘blind’ docking for modeling at large scales. However, a caveat of knowledge-based approaches is that they treat molecules as rigid structures. The Protein Data Bank (PDB) offers a wealth of conformations. Here, we exploited ensemble of the conformations in predictions by a knowledge-based method, PRISM. We tested ‘difficult’ cases in a docking-benchmark dataset, where the unbound and bound protein forms are structurally different. Considering alternative conformations for each protein, the percentage of successfully predicted interactions increased from ~26% to 66%, and 57% of the interactions were successfully predicted in an ‘unbiased’ scenario, in which data related to the bound forms were not utilized. If the appropriate conformation, or relevant template interface, is unavailable in the PDB, PRISM could not predict the interaction successfully. The pace of the growth of the PDB promises a rapid increase of ensemble conformations emphasizing the merit of such knowledge-based ensemble strategies for higher success rates in protein-protein interaction predictions on an interactome-scale. We constructed the structural network of ERK interacting proteins as a case study. PMID:23590674

  19. Comparative Study of Elastic Network Model and Protein Contact Network for Protein Complexes: The Hemoglobin Case

    Directory of Open Access Journals (Sweden)

    Guang Hu

    2017-01-01

    Full Text Available The overall topology and interfacial interactions play key roles in understanding structural and functional principles of protein complexes. Elastic Network Model (ENM and Protein Contact Network (PCN are two widely used methods for high throughput investigation of structures and interactions within protein complexes. In this work, the comparative analysis of ENM and PCN relative to hemoglobin (Hb was taken as case study. We examine four types of structural and dynamical paradigms, namely, conformational change between different states of Hbs, modular analysis, allosteric mechanisms studies, and interface characterization of an Hb. The comparative study shows that ENM has an advantage in studying dynamical properties and protein-protein interfaces, while PCN is better for describing protein structures quantitatively both from local and from global levels. We suggest that the integration of ENM and PCN would give a potential but powerful tool in structural systems biology.

  20. Anomalous diffusion in neutral evolution of model proteins

    Science.gov (United States)

    Nelson, Erik D.; Grishin, Nick V.

    2015-06-01

    Protein evolution is frequently explored using minimalist polymer models, however, little attention has been given to the problem of structural drift, or diffusion. Here, we study neutral evolution of small protein motifs using an off-lattice heteropolymer model in which individual monomers interact as low-resolution amino acids. In contrast to most earlier models, both the length and folded structure of the polymers are permitted to change. To describe structural change, we compute the mean-square distance (MSD) between monomers in homologous folds separated by n neutral mutations. We find that structural change is episodic, and, averaged over lineages (for example, those extending from a single sequence), exhibits a power-law dependence on n . We show that this exponent depends on the alignment method used, and we analyze the distribution of waiting times between neutral mutations. The latter are more disperse than for models required to maintain a specific fold, but exhibit a similar power-law tail.

  1. Beta-structures in fibrous proteins.

    Science.gov (United States)

    Kajava, Andrey V; Squire, John M; Parry, David A D

    2006-01-01

    The beta-form of protein folding, one of the earliest protein structures to be defined, was originally observed in studies of silks. It was then seen in early studies of synthetic polypeptides and, of course, is now known to be present in a variety of guises as an essential component of globular protein structures. However, in the last decade or so it has become clear that the beta-conformation of chains is present not only in many of the amyloid structures associated with, for example, Alzheimer's Disease, but also in the prion structures associated with the spongiform encephalopathies. Furthermore, X-ray crystallography studies have revealed the high incidence of the beta-fibrous proteins among virulence factors of pathogenic bacteria and viruses. Here we describe the basic forms of the beta-fold, summarize the many different new forms of beta-structural fibrous arrangements that have been discovered, and review advances in structural studies of amyloid and prion fibrils. These and other issues are described in detail in later chapters.

  2. GIS: a comprehensive source for protein structure similarities.

    Science.gov (United States)

    Guerler, Aysam; Knapp, Ernst-Walter

    2010-07-01

    A web service for analysis of protein structures that are sequentially or non-sequentially similar was generated. Recently, the non-sequential structure alignment algorithm GANGSTA+ was introduced. GANGSTA+ can detect non-sequential structural analogs for proteins stated to possess novel folds. Since GANGSTA+ ignores the polypeptide chain connectivity of secondary structure elements (i.e. alpha-helices and beta-strands), it is able to detect structural similarities also between proteins whose sequences were reshuffled during evolution. GANGSTA+ was applied in an all-against-all comparison on the ASTRAL40 database (SCOP version 1.75), which consists of >10,000 protein domains yielding about 55 x 10(6) possible protein structure alignments. Here, we provide the resulting protein structure alignments as a public web-based service, named GANGSTA+ Internet Services (GIS). We also allow to browse the ASTRAL40 database of protein structures with GANGSTA+ relative to an externally given protein structure using different constraints to select specific results. GIS allows us to analyze protein structure families according to the SCOP classification scheme. Additionally, users can upload their own protein structures for pairwise protein structure comparison, alignment against all protein structures of the ASTRAL40 database (SCOP version 1.75) or symmetry analysis. GIS is publicly available at http://agknapp.chemie.fu-berlin.de/gplus.

  3. Predicting turns in proteins with a unified model.

    Directory of Open Access Journals (Sweden)

    Qi Song

    Full Text Available MOTIVATION: Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. RESULTS: In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i using newly exploited features of structural evolution information (secondary structure and shape string of protein based on structure homologies, (ii considering all types of turns in a unified model, and (iii practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.

  4. Neural Networks for protein Structure Prediction

    DEFF Research Database (Denmark)

    Bohr, Henrik

    1998-01-01

    This is a review about neural network applications in bioinformatics. Especially the applications to protein structure prediction, e.g. prediction of secondary structures, prediction of surface structure, fold class recognition and prediction of the 3-dimensional structure of protein backbones...

  5. Structural model of dodecameric heat-shock protein Hsp21

    DEFF Research Database (Denmark)

    Rutsdottir, Gudrun; Härmark, Johan; Weide, Yoran

    2017-01-01

    for investigating structure-function relationships of Hsp21 and understanding these sequence variations, we developed a structural model of Hsp21 based on homology modeling, cryo-EM, cross-linking mass spectrometry, NMR, and small-angle X-ray scattering. Our data suggest a dodecameric arrangement of two trimer...

  6. Target specific proteochemometric model development for BACE1 - protein flexibility and structural water are critical in virtual screening.

    Science.gov (United States)

    Manoharan, Prabu; Chennoju, Kiranmai; Ghoshal, Nanda

    2015-07-01

    BACE1 is an attractive target in Alzheimer's disease (AD) treatment. A rational drug design effort for the inhibition of BACE1 is actively pursued by researchers in both academic and pharmaceutical industries. This continued effort led to the steady accumulation of BACE1 crystal structures, co-complexed with different classes of inhibitors. This wealth of information is used in this study to develop target specific proteochemometric models and these models are exploited for predicting the prospective BACE1 inhibitors. The models developed in this study have performed excellently in predicting the computationally generated poses, separately obtained from single and ensemble docking approaches. The simple protein-ligand contact (SPLC) model outperforms other sophisticated high end models, in virtual screening performance, developed during this study. In an attempt to account for BACE1 protein active site flexibility information in predictive models, we included the change in the area of solvent accessible surface and the change in the volume of solvent accessible surface in our models. The ensemble and single receptor docking results obtained from this study indicate that the structural water mediated interactions improve the virtual screening results. Also, these waters are essential for recapitulating bioactive conformation during docking study. The proteochemometric models developed in this study can be used for the prediction of BACE1 inhibitors, during the early stage of AD drug discovery.

  7. Predicting Structure and Function for Novel Proteins of an Extremophilic Iron Oxidizing Bacterium

    Science.gov (United States)

    Wheeler, K.; Zemla, A.; Banfield, J.; Thelen, M.

    2007-12-01

    Proteins isolated from uncultivated microbial populations represent the functional components of microbial processes and contribute directly to community fitness under natural conditions. Investigations into proteins in the environment are hindered by the lack of genome data, or where available, the high proportion of proteins of unknown function. We have identified thousands of proteins from biofilms in the extremely acidic drainage outflow of an iron mine ecosystem (1). With an extensive genomic and proteomic foundation, we have focused directly on the problem of several hundred proteins of unknown function within this well-defined model system. Here we describe the geobiological insights gained by using a high throughput computational approach for predicting structure and function of 421 novel proteins from the biofilm community. We used a homology based modeling system to compare these proteins to those of known structure (AS2TS) (2). This approach has resulted in the assignment of structures to 360 proteins (85%) and provided functional information for up to 75% of the modeled proteins. Detailed examination of the modeling results enables confident, high-throughput prediction of the roles of many of the novel proteins within the microbial community. For instance, one prediction places a protein in the phosphoenolpyruvate/pyruvate domain superfamily as a carboxylase that fills in a gap in an otherwise complete carbon cycle. Particularly important for a community in such a metal rich environment is the evolution of over 25% of the novel proteins that contain a metal cofactor; of these, one third are likely Fe containing proteins. Two of the most abundant proteins in biofilm samples are unusual c-type cytochromes. Both of these proteins catalyze iron- oxidation, a key metabolic reaction supporting the energy requirements of this community. Structural models of these cytochromes verify our experimental results on heme binding and electron transfer reactivity, and

  8. (PS)2: protein structure prediction server version 3.0.

    Science.gov (United States)

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. The Phyre2 web portal for protein modeling, prediction and analysis.

    Science.gov (United States)

    Kelley, Lawrence A; Mezulis, Stefans; Yates, Christopher M; Wass, Mark N; Sternberg, Michael J E

    2015-06-01

    Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.

  10. Structure based alignment and clustering of proteins (STRALCP)

    Science.gov (United States)

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  11. Understanding Protein-Protein Interactions Using Local Structural Features

    DEFF Research Database (Denmark)

    Planas-Iglesias, Joan; Bonet, Jaume; García-García, Javier

    2013-01-01

    Protein-protein interactions (PPIs) play a relevant role among the different functions of a cell. Identifying the PPI network of a given organism (interactome) is useful to shed light on the key molecular mechanisms within a biological system. In this work, we show the role of structural features...... interacting and non-interacting protein pairs to classify the structural features that sustain the binding (or non-binding) behavior. Our study indicates that not only the interacting region but also the rest of the protein surface are important for the interaction fate. The interpretation...... to score the likelihood of the interaction between two proteins and to develop a method for the prediction of PPIs. We have tested our method on several sets with unbalanced ratios of interactions and non-interactions to simulate real conditions, obtaining accuracies higher than 25% in the most unfavorable...

  12. Mechanical Modeling and Computer Simulation of Protein Folding

    Science.gov (United States)

    Prigozhin, Maxim B.; Scott, Gregory E.; Denos, Sharlene

    2014-01-01

    In this activity, science education and modern technology are bridged to teach students at the high school and undergraduate levels about protein folding and to strengthen their model building skills. Students are guided from a textbook picture of a protein as a rigid crystal structure to a more realistic view: proteins are highly dynamic…

  13. Structural genomics: keeping up with expanding knowledge of the protein universe

    Science.gov (United States)

    Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek

    2010-01-01

    Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space — a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a reassessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006. PMID:17587562

  14. Structural genomics: keeping up with expanding knowledge of the protein universe.

    Science.gov (United States)

    Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek

    2007-06-01

    Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space--a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a re-assessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006.

  15. Efficient protein structure search using indexing methods.

    Science.gov (United States)

    Kim, Sungchul; Sael, Lee; Yu, Hwanjo

    2013-01-01

    Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.

  16. What determines the structures of native folds of proteins?

    International Nuclear Information System (INIS)

    Trovato, Antonio; Hoang, Trinh X; Banavar, Jayanth R; Maritan, Amos; Seno, Flavio

    2005-01-01

    We review a simple physical model (Hoang et al 2004 Proc. Natl Acad. Sci. USA 101 7960, Banavar et al 2004 Phys. Rev. E at press) which captures the essential physico-chemical ingredients that determine protein structure, such as the inherent anisotropy of a chain molecule, the geometrical and energetic constraints placed by hydrogen bonds, sterics, and hydrophobicity. Within this framework, marginally compact conformations resembling the native state folds of proteins emerge as competing minima in the free energy landscape. Here we demonstrate that a hydrophobic-polar (HP) sequence composed of regularly repeated patterns has as its ground state a β-helical structure remarkably similar to a known architecture in the Protein Data Bank

  17. A 'periodic table' for protein structures.

    Science.gov (United States)

    Taylor, William R

    2002-04-11

    Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.

  18. Structural and Functional Annotation of Hypothetical Proteins of O139

    Directory of Open Access Journals (Sweden)

    Md. Saiful Islam

    2015-06-01

    Full Text Available In developing countries threat of cholera is a significant health concern whenever water purification and sewage disposal systems are inadequate. Vibrio cholerae is one of the responsible bacteria involved in cholera disease. The complete genome sequence of V. cholerae deciphers the presence of various genes and hypothetical proteins whose function are not yet understood. Hence analyzing and annotating the structure and function of hypothetical proteins is important for understanding the V. cholerae. V. cholerae O139 is the most common and pathogenic bacterial strain among various V. cholerae strains. In this study sequence of six hypothetical proteins of V. cholerae O139 has been annotated from NCBI. Various computational tools and databases have been used to determine domain family, protein-protein interaction, solubility of protein, ligand binding sites etc. The three dimensional structure of two proteins were modeled and their ligand binding sites were identified. We have found domains and families of only one protein. The analysis revealed that these proteins might have antibiotic resistance activity, DNA breaking-rejoining activity, integrase enzyme activity, restriction endonuclease, etc. Structural prediction of these proteins and detection of binding sites from this study would indicate a potential target aiding docking studies for therapeutic designing against cholera.

  19. Protein Secondary Structures (α-helix and β-sheet) at a Cellular Level and Protein Fractions in Relation to Rumen Degradation Behaviours of Protein: A New Approach

    International Nuclear Information System (INIS)

    Yu, P.

    2007-01-01

    Studying the secondary structure of proteins leads to an understanding of the components that make up a whole protein, and such an understanding of the structure of the whole protein is often vital to understanding its digestive behaviour and nutritive value in animals. The main protein secondary structures are the α-helix and β-sheet. The percentage of these two structures in protein secondary structures influences protein nutritive value, quality and digestive behaviour. A high percentage of β-sheet structure may partly cause a low access to gastrointestinal digestive enzymes, which results in a low protein value. The objectives of the present study were to use advanced synchrotron-based Fourier transform IR (S-FTIR) microspectroscopy as a new approach to reveal the molecular chemistry of the protein secondary structures of feed tissues affected by heat-processing within intact tissue at a cellular level, and to quantify protein secondary structures using multicomponent peak modelling Gaussian and Lorentzian methods, in relation to protein digestive behaviours and nutritive value in the rumen, which was determined using the Cornell Net Carbohydrate Protein System. The synchrotron-based molecular chemistry research experiment was performed at the National Synchrotron Light Source at Brookhaven National Laboratory, US Department of Energy. The results showed that, with S-FTIR microspectroscopy, the molecular chemistry, ultrastructural chemical make-up and nutritive characteristics could be revealed at a high ultraspatial resolution (∼10 μm). S-FTIR microspectroscopy revealed that the secondary structure of protein differed between raw and roasted golden flaxseeds in terms of the percentages and ratio of α-helixes and β-sheets in the mid-IR range at the cellular level. By using multicomponent peak modelling, the results show that the roasting reduced (P <0.05) the percentage of α-helixes (from 47.1% to 36.1%: S-FTIR absorption intensity), increased the

  20. Protein Secondary Structures (alpha-helix and beta-sheet) at a Cellular Levle and Protein Fractions in Relation to Rumen Degradation Behaviours of Protein: A New Approach

    Energy Technology Data Exchange (ETDEWEB)

    Yu,P.

    2007-01-01

    Studying the secondary structure of proteins leads to an understanding of the components that make up a whole protein, and such an understanding of the structure of the whole protein is often vital to understanding its digestive behaviour and nutritive value in animals. The main protein secondary structures are the {alpha}-helix and {beta}-sheet. The percentage of these two structures in protein secondary structures influences protein nutritive value, quality and digestive behaviour. A high percentage of {beta}-sheet structure may partly cause a low access to gastrointestinal digestive enzymes, which results in a low protein value. The objectives of the present study were to use advanced synchrotron-based Fourier transform IR (S-FTIR) microspectroscopy as a new approach to reveal the molecular chemistry of the protein secondary structures of feed tissues affected by heat-processing within intact tissue at a cellular level, and to quantify protein secondary structures using multicomponent peak modelling Gaussian and Lorentzian methods, in relation to protein digestive behaviours and nutritive value in the rumen, which was determined using the Cornell Net Carbohydrate Protein System. The synchrotron-based molecular chemistry research experiment was performed at the National Synchrotron Light Source at Brookhaven National Laboratory, US Department of Energy. The results showed that, with S-FTIR microspectroscopy, the molecular chemistry, ultrastructural chemical make-up and nutritive characteristics could be revealed at a high ultraspatial resolution ({approx}10 {mu}m). S-FTIR microspectroscopy revealed that the secondary structure of protein differed between raw and roasted golden flaxseeds in terms of the percentages and ratio of {alpha}-helixes and {beta}-sheets in the mid-IR range at the cellular level. By using multicomponent peak modelling, the results show that the roasting reduced (P <0.05) the percentage of {alpha}-helixes (from 47.1% to 36.1%: S

  1. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    Science.gov (United States)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  2. Structures composing protein domains.

    Science.gov (United States)

    Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel; Hudeček, Jiří

    2013-08-01

    This review summarizes available data concerning intradomain structures (IS) such as functionally important amino acid residues, short linear motifs, conserved or disordered regions, peptide repeats, broadly occurring secondary structures or folds, etc. IS form structural features (units or elements) necessary for interactions with proteins or non-peptidic ligands, enzyme reactions and some structural properties of proteins. These features have often been related to a single structural level (e.g. primary structure) mostly requiring certain structural context of other levels (e.g. secondary structures or supersecondary folds) as follows also from some examples reported or demonstrated here. In addition, we deal with some functionally important dynamic properties of IS (e.g. flexibility and different forms of accessibility), and more special dynamic changes of IS during enzyme reactions and allosteric regulation. Selected notes concern also some experimental methods, still more necessary tools of bioinformatic processing and clinically interesting relationships. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  3. Current strategies for protein production and purification enabling membrane protein structural biology.

    Science.gov (United States)

    Pandey, Aditya; Shin, Kyungsoo; Patterson, Robin E; Liu, Xiang-Qin; Rainey, Jan K

    2016-12-01

    Membrane proteins are still heavily under-represented in the protein data bank (PDB), owing to multiple bottlenecks. The typical low abundance of membrane proteins in their natural hosts makes it necessary to overexpress these proteins either in heterologous systems or through in vitro translation/cell-free expression. Heterologous expression of proteins, in turn, leads to multiple obstacles, owing to the unpredictability of compatibility of the target protein for expression in a given host. The highly hydrophobic and (or) amphipathic nature of membrane proteins also leads to challenges in producing a homogeneous, stable, and pure sample for structural studies. Circumventing these hurdles has become possible through the introduction of novel protein production protocols; efficient protein isolation and sample preparation methods; and, improvement in hardware and software for structural characterization. Combined, these advances have made the past 10-15 years very exciting and eventful for the field of membrane protein structural biology, with an exponential growth in the number of solved membrane protein structures. In this review, we focus on both the advances and diversity of protein production and purification methods that have allowed this growth in structural knowledge of membrane proteins through X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (cryo-EM).

  4. Refinement of protein termini in template-based modeling using conformational space annealing.

    Science.gov (United States)

    Park, Hahnbeom; Ko, Junsu; Joo, Keehyoung; Lee, Julian; Seok, Chaok; Lee, Jooyoung

    2011-09-01

    The rapid increase in the number of experimentally determined protein structures in recent years enables us to obtain more reliable protein tertiary structure models than ever by template-based modeling. However, refinement of template-based models beyond the limit available from the best templates is still needed for understanding protein function in atomic detail. In this work, we develop a new method for protein terminus modeling that can be applied to refinement of models with unreliable terminus structures. The energy function for terminus modeling consists of both physics-based and knowledge-based potential terms with carefully optimized relative weights. Effective sampling of both the framework and terminus is performed using the conformational space annealing technique. This method has been tested on a set of termini derived from a nonredundant structure database and two sets of termini from the CASP8 targets. The performance of the terminus modeling method is significantly improved over our previous method that does not employ terminus refinement. It is also comparable or superior to the best server methods tested in CASP8. The success of the current approach suggests that similar strategy may be applied to other types of refinement problems such as loop modeling or secondary structure rearrangement. Copyright © 2011 Wiley-Liss, Inc.

  5. Protein structure similarity from principle component correlation analysis

    Directory of Open Access Journals (Sweden)

    Chou James

    2006-01-01

    Full Text Available Abstract Background Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. Results We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. Conclusion The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum

  6. Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models.

    Directory of Open Access Journals (Sweden)

    2005-08-01

    Full Text Available The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB, target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB, it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  7. De novo structural modeling and computational sequence analysis ...

    African Journals Online (AJOL)

    Different bioinformatics tools and machine learning techniques were used for protein structural classification. De novo protein modeling was performed by using I-TASSER server. The final model obtained was accessed by PROCHECK and DFIRE2, which confirmed that the final model is reliable. Until complete biochemical ...

  8. Analysis and Ranking of Protein-Protein Docking Models Using Inter-Residue Contacts and Inter-Molecular Contact Maps

    KAUST Repository

    Oliva, Romina; Chermak, Edrisse; Cavallo, Luigi

    2015-01-01

    In view of the increasing interest both in inhibitors of protein-protein interactions and in protein drugs themselves, analysis of the three-dimensional structure of protein-protein complexes is assuming greater relevance in drug design. In the many cases where an experimental structure is not available, protein-protein docking becomes the method of choice for predicting the arrangement of the complex. However, reliably scoring protein-protein docking poses is still an unsolved problem. As a consequence, the screening of many docking models is usually required in the analysis step, to possibly single out the correct ones. Here, making use of exemplary cases, we review our recently introduced methods for the analysis of protein complex structures and for the scoring of protein docking poses, based on the use of inter-residue contacts and their visualization in inter-molecular contact maps. We also show that the ensemble of tools we developed can be used in the context of rational drug design targeting protein-protein interactions.

  9. Analysis and Ranking of Protein-Protein Docking Models Using Inter-Residue Contacts and Inter-Molecular Contact Maps

    KAUST Repository

    Oliva, Romina

    2015-07-01

    In view of the increasing interest both in inhibitors of protein-protein interactions and in protein drugs themselves, analysis of the three-dimensional structure of protein-protein complexes is assuming greater relevance in drug design. In the many cases where an experimental structure is not available, protein-protein docking becomes the method of choice for predicting the arrangement of the complex. However, reliably scoring protein-protein docking poses is still an unsolved problem. As a consequence, the screening of many docking models is usually required in the analysis step, to possibly single out the correct ones. Here, making use of exemplary cases, we review our recently introduced methods for the analysis of protein complex structures and for the scoring of protein docking poses, based on the use of inter-residue contacts and their visualization in inter-molecular contact maps. We also show that the ensemble of tools we developed can be used in the context of rational drug design targeting protein-protein interactions.

  10. Correlated mutations in protein sequences: Phylogenetic and structural effects

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.S. [Los Alamos National Lab., NM (United States). Theoretical Div.]|[Santa Fe Inst., NM (United States); Giraud, B.G. [C.E.N. Saclay, Gif/Yvette (France). Service Physique Theorique; Liu, L.C. [Los Alamos National Lab., NM (United States). Theoretical Div.; Stormo, G.D. [Univ. of Colorado, Boulder, CO (United States). Dept. of Molecular, Cellular and Developmental Biology

    1998-12-01

    Covariation analysis of sets of aligned sequences for RNA molecules is relatively successful in elucidating RNA secondary structure, as well as some aspects of tertiary structure. Covariation analysis of sets of aligned sequences for protein molecules is successful in certain instances in elucidating certain structural and functional links, but in general, pairs of sites displaying highly covarying mutations in protein sequences do not necessarily correspond to sites that are spatially close in the protein structure. In this paper the authors identify two reasons why naive use of covariation analysis for protein sequences fails to reliably indicate sequence positions that are spatially proximate. The first reason involves the bias introduced in calculation of covariation measures due to the fact that biological sequences are generally related by a non-trivial phylogenetic tree. The authors present a null-model approach to solve this problem. The second reason involves linked chains of covariation which can result in pairs of sites displaying significant covariation even though they are not spatially proximate. They present a maximum entropy solution to this classic problem of causation versus correlation. The methodologies are validated in simulation.

  11. Structure-function correlations of pulmonary surfactant protein SP-B and the saposin-like family of proteins.

    Science.gov (United States)

    Olmeda, Bárbara; García-Álvarez, Begoña; Pérez-Gil, Jesús

    2013-03-01

    Pulmonary surfactant is a lipid-protein complex secreted by the respiratory epithelium of mammalian lungs, which plays an essential role in stabilising the alveolar surface and so reducing the work of breathing. The surfactant protein SP-B is part of this complex, and is strictly required for the assembly of pulmonary surfactant and its extracellular development to form stable surface-active films at the air-liquid alveolar interface, making the lack of SP-B incompatible with life. In spite of its physiological importance, a model for the structure and the mechanism of action of SP-B is still needed. The sequence of SP-B is homologous to that of the saposin-like family of proteins, which are membrane-interacting polypeptides with apparently diverging activities, from the co-lipase action of saposins to facilitate the degradation of sphingolipids in the lysosomes to the cytolytic actions of some antibiotic proteins, such as NK-lysin and granulysin or the amoebapore of Entamoeba histolytica. Numerous studies on the interactions of these proteins with membranes have still not explained how a similar sequence and a potentially related fold can sustain such apparently different activities. In the present review, we have summarised the most relevant features of the structure, lipid-protein and protein-protein interactions of SP-B and the saposin-like family of proteins, as a basis to propose an integrated model and a common mechanistic framework of the apparent functional versatility of the saposin fold.

  12. Electrostatics, structure prediction, and the energy landscapes for protein folding and binding.

    Science.gov (United States)

    Tsai, Min-Yeh; Zheng, Weihua; Balamurugan, D; Schafer, Nicholas P; Kim, Bobby L; Cheung, Margaret S; Wolynes, Peter G

    2016-01-01

    While being long in range and therefore weakly specific, electrostatic interactions are able to modulate the stability and folding landscapes of some proteins. The relevance of electrostatic forces for steering the docking of proteins to each other is widely acknowledged, however, the role of electrostatics in establishing specifically funneled landscapes and their relevance for protein structure prediction are still not clear. By introducing Debye-Hückel potentials that mimic long-range electrostatic forces into the Associative memory, Water mediated, Structure, and Energy Model (AWSEM), a transferable protein model capable of predicting tertiary structures, we assess the effects of electrostatics on the landscapes of thirteen monomeric proteins and four dimers. For the monomers, we find that adding electrostatic interactions does not improve structure prediction. Simulations of ribosomal protein S6 show, however, that folding stability depends monotonically on electrostatic strength. The trend in predicted melting temperatures of the S6 variants agrees with experimental observations. Electrostatic effects can play a range of roles in binding. The binding of the protein complex KIX-pKID is largely assisted by electrostatic interactions, which provide direct charge-charge stabilization of the native state and contribute to the funneling of the binding landscape. In contrast, for several other proteins, including the DNA-binding protein FIS, electrostatics causes frustration in the DNA-binding region, which favors its binding with DNA but not with its protein partner. This study highlights the importance of long-range electrostatics in functional responses to problems where proteins interact with their charged partners, such as DNA, RNA, as well as membranes. © 2015 The Protein Society.

  13. SDSL-ESR-based protein structure characterization

    NARCIS (Netherlands)

    Strancar, J.; Kavalenka, A.A.; Urbancic, I.; Ljubetic, A.; Hemminga, M.A.

    2010-01-01

    As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be

  14. Modeling the structure of SARS 3a transmembrane protein using a ...

    Indian Academy of Sciences (India)

    three α-helices has been subjected to MD simulations to examine its quality. The TM bundle was ... of the structure of the channel, however, are yet to be elucidated. ... interactions between the proteins and the lipid bilayer has been studied ...

  15. Structure and Pathology of Tau Protein in Alzheimer Disease

    Directory of Open Access Journals (Sweden)

    Michala Kolarova

    2012-01-01

    Full Text Available Alzheimer's disease (AD is the most common type of dementia. In connection with the global trend of prolonging human life and the increasing number of elderly in the population, the AD becomes one of the most serious health and socioeconomic problems of the present. Tau protein promotes assembly and stabilizes microtubules, which contributes to the proper function of neuron. Alterations in the amount or the structure of tau protein can affect its role as a stabilizer of microtubules as well as some of the processes in which it is implicated. The molecular mechanisms governing tau aggregation are mainly represented by several posttranslational modifications that alter its structure and conformational state. Hence, abnormal phosphorylation and truncation of tau protein have gained attention as key mechanisms that become tau protein in a pathological entity. Evidences about the clinicopathological significance of phosphorylated and truncated tau have been documented during the progression of AD as well as their capacity to exert cytotoxicity when expressed in cell and animal models. This paper describes the normal structure and function of tau protein and its major alterations during its pathological aggregation in AD.

  16. Structural determinants for protein adsorption/non-adsorption to silica surface

    International Nuclear Information System (INIS)

    Mathe, Christelle; Devineau, Stephanie; Aude, Jean-Christophe; Lagniel, Gilles; Chedin, Stephane; Legros, Veronique; Mathon, Marie-Helene; Renault, Jean-Philippe; Pin, Serge; Boulard, Yves; Labarre, Jean

    2013-01-01

    The understanding of the mechanisms involved in the interaction of proteins with inorganic surfaces is of major interest in both fundamental research and applications such as nano-technology. However, despite intense research, the mechanisms and the structural determinants of protein/surface interactions are still unclear. We developed a strategy consisting in identifying, in a mixture of hundreds of soluble proteins, those proteins that are adsorbed on the surface and those that are not. If the two protein subsets are large enough, their statistical comparative analysis must reveal the physicochemical determinants relevant for adsorption versus non-adsorption. This methodology was tested with silica nanoparticles. We found that the adsorbed proteins contain a higher number of charged amino acids, particularly arginine, which is consistent with involvement of this basic amino acid in electrostatic interactions with silica. The analysis also identified a marked bias toward low aromatic amino acid content (phenylalanine, tryptophan, tyrosine and histidine) in adsorbed proteins. Structural analyses and molecular dynamics simulations of proteins from the two groups indicate that non-adsorbed proteins have twice as many p-p interactions and higher structural rigidity. The data are consistent with the notion that adsorption is correlated with the flexibility of the protein and with its ability to spread on the surface. Our findings led us to propose a refined model of protein adsorption. (authors)

  17. Structural determinants for protein adsorption/non-adsorption to silica surface.

    Directory of Open Access Journals (Sweden)

    Christelle Mathé

    Full Text Available The understanding of the mechanisms involved in the interaction of proteins with inorganic surfaces is of major interest in both fundamental research and applications such as nanotechnology. However, despite intense research, the mechanisms and the structural determinants of protein/surface interactions are still unclear. We developed a strategy consisting in identifying, in a mixture of hundreds of soluble proteins, those proteins that are adsorbed on the surface and those that are not. If the two protein subsets are large enough, their statistical comparative analysis must reveal the physicochemical determinants relevant for adsorption versus non-adsorption. This methodology was tested with silica nanoparticles. We found that the adsorbed proteins contain a higher number of charged amino acids, particularly arginine, which is consistent with involvement of this basic amino acid in electrostatic interactions with silica. The analysis also identified a marked bias toward low aromatic amino acid content (phenylalanine, tryptophan, tyrosine and histidine in adsorbed proteins. Structural analyses and molecular dynamics simulations of proteins from the two groups indicate that non-adsorbed proteins have twice as many π-π interactions and higher structural rigidity. The data are consistent with the notion that adsorption is correlated with the flexibility of the protein and with its ability to spread on the surface. Our findings led us to propose a refined model of protein adsorption.

  18. Hidden Markov model approach for identifying the modular framework of the protein backbone.

    Science.gov (United States)

    Camproux, A C; Tuffery, P; Chevrolat, J P; Boisvieux, J F; Hazout, S

    1999-12-01

    The hidden Markov model (HMM) was used to identify recurrent short 3D structural building blocks (SBBs) describing protein backbones, independently of any a priori knowledge. Polypeptide chains are decomposed into a series of short segments defined by their inter-alpha-carbon distances. Basically, the model takes into account the sequentiality of the observed segments and assumes that each one corresponds to one of several possible SBBs. Fitting the model to a database of non-redundant proteins allowed us to decode proteins in terms of 12 distinct SBBs with different roles in protein structure. Some SBBs correspond to classical regular secondary structures. Others correspond to a significant subdivision of their bounding regions previously considered to be a single pattern. The major contribution of the HMM is that this model implicitly takes into account the sequential connections between SBBs and thus describes the most probable pathways by which the blocks are connected to form the framework of the protein structures. Validation of the SBBs code was performed by extracting SBB series repeated in recoding proteins and examining their structural similarities. Preliminary results on the sequence specificity of SBBs suggest promising perspectives for the prediction of SBBs or series of SBBs from the protein sequences.

  19. Reduced Fragment Diversity for Alpha and Alpha-Beta Protein Structure Prediction using Rosetta.

    Science.gov (United States)

    Abbass, Jad; Nebel, Jean-Christophe

    2017-01-01

    Protein structure prediction is considered a main challenge in computational biology. The biannual international competition, Critical Assessment of protein Structure Prediction (CASP), has shown in its eleventh experiment that free modelling target predictions are still beyond reliable accuracy, therefore, much effort should be made to improve ab initio methods. Arguably, Rosetta is considered as the most competitive method when it comes to targets with no homologues. Relying on fragments of length 9 and 3 from known structures, Rosetta creates putative structures by assembling candidate fragments. Generally, the structure with the lowest energy score, also known as first model, is chosen to be the "predicted one". A thorough study has been conducted on the role and diversity of 3-mers involved in Rosetta's model "refinement" phase. Usage of the standard number of 3-mers - i.e. 200 - has been shown to degrade alpha and alpha-beta protein conformations initially achieved by assembling 9-mers. Therefore, a new prediction pipeline is proposed for Rosetta where the "refinement" phase is customised according to a target's structural class prediction. Over 8% improvement in terms of first model structure accuracy is reported for alpha and alpha-beta classes when decreasing the number of 3- mers. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  20. The small heat shock proteins from Acidithiobacillus ferrooxidans: gene expression, phylogenetic analysis, and structural modeling

    Directory of Open Access Journals (Sweden)

    Ribeiro Daniela A

    2011-12-01

    Full Text Available Abstract Background Acidithiobacillus ferrooxidans is an acidophilic, chemolithoautotrophic bacterium that has been successfully used in metal bioleaching. In this study, an analysis of the A. ferrooxidans ATCC 23270 genome revealed the presence of three sHSP genes, Afe_1009, Afe_1437 and Afe_2172, that encode proteins from the HSP20 family, a class of intracellular multimers that is especially important in extremophile microorganisms. Results The expression of the sHSP genes was investigated in A. ferrooxidans cells submitted to a heat shock at 40°C for 15, 30 and 60 minutes. After 60 minutes, the gene on locus Afe_1437 was about 20-fold more highly expressed than the gene on locus Afe_2172. Bioinformatic and phylogenetic analyses showed that the sHSPs from A. ferrooxidans are possible non-paralogous proteins, and are regulated by the σ32 factor, a common transcription factor of heat shock proteins. Structural studies using homology molecular modeling indicated that the proteins encoded by Afe_1009 and Afe_1437 have a conserved α-crystallin domain and share similar structural features with the sHSP from Methanococcus jannaschii, suggesting that their biological assembly involves 24 molecules and resembles a hollow spherical shell. Conclusion We conclude that the sHSPs encoded by the Afe_1437 and Afe_1009 genes are more likely to act as molecular chaperones in the A. ferrooxidans heat shock response. In addition, the three sHSPs from A. ferrooxidans are not recent paralogs, and the Afe_1437 and Afe_1009 genes could be inherited horizontally by A. ferrooxidans.

  1. Three-dimensional protein structure prediction: Methods and computational strategies.

    Science.gov (United States)

    Dorn, Márcio; E Silva, Mariel Barbachan; Buriol, Luciana S; Lamb, Luis C

    2014-10-12

    A long standing problem in structural bioinformatics is to determine the three-dimensional (3-D) structure of a protein when only a sequence of amino acid residues is given. Many computational methodologies and algorithms have been proposed as a solution to the 3-D Protein Structure Prediction (3-D-PSP) problem. These methods can be divided in four main classes: (a) first principle methods without database information; (b) first principle methods with database information; (c) fold recognition and threading methods; and (d) comparative modeling methods and sequence alignment strategies. Deterministic computational techniques, optimization techniques, data mining and machine learning approaches are typically used in the construction of computational solutions for the PSP problem. Our main goal with this work is to review the methods and computational strategies that are currently used in 3-D protein prediction. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Rhabdovirus matrix protein structures reveal a novel mode of self-association.

    Directory of Open Access Journals (Sweden)

    Stephen C Graham

    2008-12-01

    Full Text Available The matrix (M proteins of rhabdoviruses are multifunctional proteins essential for virus maturation and budding that also regulate the expression of viral and host proteins. We have solved the structures of M from the vesicular stomatitis virus serotype New Jersey (genus: Vesiculovirus and from Lagos bat virus (genus: Lyssavirus, revealing that both share a common fold despite sharing no identifiable sequence homology. Strikingly, in both structures a stretch of residues from the otherwise-disordered N terminus of a crystallographically adjacent molecule is observed binding to a hydrophobic cavity on the surface of the protein, thereby forming non-covalent linear polymers of M in the crystals. While the overall topology of the interaction is conserved between the two structures, the molecular details of the interactions are completely different. The observed interactions provide a compelling model for the flexible self-assembly of the matrix protein during virion morphogenesis and may also modulate interactions with host proteins.

  3. Illuminating structural proteins in viral "dark matter" with metaproteomics.

    Science.gov (United States)

    Brum, Jennifer R; Ignacio-Espinoza, J Cesar; Kim, Eun-Hae; Trubl, Gareth; Jones, Robert M; Roux, Simon; VerBerkmoes, Nathan C; Rich, Virginia I; Sullivan, Matthew B

    2016-03-01

    Viruses are ecologically important, yet environmental virology is limited by dominance of unannotated genomic sequences representing taxonomic and functional "viral dark matter." Although recent analytical advances are rapidly improving taxonomic annotations, identifying functional dark matter remains problematic. Here, we apply paired metaproteomics and dsDNA-targeted metagenomics to identify 1,875 virion-associated proteins from the ocean. Over one-half of these proteins were newly functionally annotated and represent abundant and widespread viral metagenome-derived protein clusters (PCs). One primarily unannotated PC dominated the dataset, but structural modeling and genomic context identified this PC as a previously unidentified capsid protein from multiple uncultivated tailed virus families. Furthermore, four of the five most abundant PCs in the metaproteome represent capsid proteins containing the HK97-like protein fold previously found in many viruses that infect all three domains of life. The dominance of these proteins within our dataset, as well as their global distribution throughout the world's oceans and seas, supports prior hypotheses that this HK97-like protein fold is the most abundant biological structure on Earth. Together, these culture-independent analyses improve virion-associated protein annotations, facilitate the investigation of proteins within natural viral communities, and offer a high-throughput means of illuminating functional viral dark matter.

  4. NMR structure of the protein NP-247299.1: comparison with the crystal structure

    International Nuclear Information System (INIS)

    Jaudzems, Kristaps; Geralt, Michael; Serrano, Pedro; Mohanty, Biswaranjan; Horst, Reto; Pedrini, Bill; Elsliger, Marc-André; Wilson, Ian A.; Wüthrich, Kurt

    2010-01-01

    Comparison of the NMR and crystal structures of a protein determined using largely automated methods has enabled the interpretation of local differences in the highly similar structures. These differences are found in segments of higher B values in the crystal and correlate with dynamic processes on the NMR chemical shift timescale observed in solution. The NMR structure of the protein NP-247299.1 in solution at 313 K has been determined and is compared with the X-ray crystal structure, which was also solved in the Joint Center for Structural Genomics (JCSG) at 100 K and at 1.7 Å resolution. Both structures were obtained using the current largely automated crystallographic and solution NMR methods used by the JCSG. This paper assesses the accuracy and precision of the results from these recently established automated approaches, aiming for quantitative statements about the location of structure variations that may arise from either one of the methods used or from the different environments in solution and in the crystal. To evaluate the possible impact of the different software used for the crystallographic and the NMR structure determinations and analysis, the concept is introduced of reference structures, which are computed using the NMR software with input of upper-limit distance constraints derived from the molecular models representing the results of the two structure determinations. The use of this new approach is explored to quantify global differences that arise from the different methods of structure determination and analysis versus those that represent interesting local variations or dynamics. The near-identity of the protein core in the NMR and crystal structures thus provided a basis for the identification of complementary information from the two different methods. It was thus observed that locally increased crystallographic B values correlate with dynamic structural polymorphisms in solution, including that the solution state of the protein involves

  5. DeepQA: improving the estimation of single protein model quality with deep belief networks.

    Science.gov (United States)

    Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin

    2016-12-05

    Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .

  6. Contingency Table Browser - prediction of early stage protein structure.

    Science.gov (United States)

    Kalinowska, Barbara; Krzykalski, Artur; Roterman, Irena

    2015-01-01

    The Early Stage (ES) intermediate represents the starting structure in protein folding simulations based on the Fuzzy Oil Drop (FOD) model. The accuracy of FOD predictions is greatly dependent on the accuracy of the chosen intermediate. A suitable intermediate can be constructed using the sequence-structure relationship information contained in the so-called contingency table - this table expresses the likelihood of encountering various structural motifs for each tetrapeptide fragment in the amino acid sequence. The limited accuracy with which such structures could previously be predicted provided the motivation for a more indepth study of the contingency table itself. The Contingency Table Browser is a tool which can visualize, search and analyze the table. Our work presents possible applications of Contingency Table Browser, among them - analysis of specific protein sequences from the point of view of their structural ambiguity.

  7. Defining an essence of structure determining residue contacts in proteins.

    Science.gov (United States)

    Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-12-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the

  8. UNRES server for physics-based coarse-grained simulations and prediction of protein structure, dynamics and thermodynamics.

    Science.gov (United States)

    Czaplewski, Cezary; Karczynska, Agnieszka; Sieradzan, Adam K; Liwo, Adam

    2018-04-30

    A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.

  9. Coarse-grain modelling of protein-protein interactions

    NARCIS (Netherlands)

    Baaden, Marc; Marrink, Siewert J.

    2013-01-01

    Here, we review recent advances towards the modelling of protein-protein interactions (PPI) at the coarse-grained (CG) level, a technique that is now widely used to understand protein affinity, aggregation and self-assembly behaviour. PPI models of soluble proteins and membrane proteins are

  10. Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major

    International Nuclear Information System (INIS)

    Arakaki, Tracy; Le Trong, Isolde; Phizicky, Eric; Quartley, Erin; DeTitta, George; Luft, Joseph; Lauricella, Angela; Anderson, Lori; Kalyuzhniy, Oleksandr; Worthey, Elizabeth; Myler, Peter J.; Kim, David; Baker, David; Hol, Wim G. J.; Merritt, Ethan A.

    2006-01-01

    The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD) using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R free = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif

  11. Routine phasing of coiled-coil protein crystal structures with AMPLE

    Directory of Open Access Journals (Sweden)

    Jens M. H. Thomas

    2015-03-01

    Full Text Available Coiled-coil protein folds are among the most abundant in nature. These folds consist of long wound α-helices and are architecturally simple, but paradoxically their crystallographic structures are notoriously difficult to solve with molecular-replacement techniques. The program AMPLE can solve crystal structures by molecular replacement using ab initio search models in the absence of an existent homologous protein structure. AMPLE has been benchmarked on a large and diverse test set of coiled-coil crystal structures and has been found to solve 80% of all cases. Successes included structures with chain lengths of up to 253 residues and resolutions down to 2.9 Å, considerably extending the limits on size and resolution that are typically tractable by ab initio methodologies. The structures of two macromolecular complexes, one including DNA, were also successfully solved using their coiled-coil components. It is demonstrated that both the ab initio modelling and the use of ensemble search models contribute to the success of AMPLE by comparison with phasing attempts using single structures or ideal polyalanine helices. These successes suggest that molecular replacement with AMPLE should be the method of choice for the crystallographic elucidation of a coiled-coil structure. Furthermore, AMPLE may be able to exploit the presence of a coiled coil in a complex to provide a convenient route for phasing.

  12. A computational model of the LGI1 protein suggests a common binding site for ADAM proteins.

    Directory of Open Access Journals (Sweden)

    Emanuela Leonardi

    Full Text Available Mutations of human leucine-rich glioma inactivated (LGI1 gene encoding the epitempin protein cause autosomal dominant temporal lateral epilepsy (ADTLE, a rare familial partial epileptic syndrome. The LGI1 gene seems to have a role on the transmission of neuronal messages but the exact molecular mechanism remains unclear. In contrast to other genes involved in epileptic disorders, epitempin shows no homology with known ion channel genes but contains two domains, composed of repeated structural units, known to mediate protein-protein interactions.A three dimensional in silico model of the two epitempin domains was built to predict the structure-function relationship and propose a functional model integrating previous experimental findings. Conserved and electrostatic charged regions of the model surface suggest a possible arrangement between the two domains and identifies a possible ADAM protein binding site in the β-propeller domain and another protein binding site in the leucine-rich repeat domain. The functional model indicates that epitempin could mediate the interaction between proteins localized to different synaptic sides in a static way, by forming a dimer, or in a dynamic way, by binding proteins at different times.The model was also used to predict effects of known disease-causing missense mutations. Most of the variants are predicted to alter protein folding while several other map to functional surface regions. In agreement with experimental evidence, this suggests that non-secreted LGI1 mutants could be retained within the cell by quality control mechanisms or by altering interactions required for the secretion process.

  13. In silico local structure approach: a case study on outer membrane proteins.

    Science.gov (United States)

    Martin, Juliette; de Brevern, Alexandre G; Camproux, Anne-Claude

    2008-04-01

    The detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results. (c) 2007 Wiley-Liss, Inc.

  14. From the Protein's Perspective: The Benefits and Challenges of Protein Structure-Based Pharmacophore Modeling

    NARCIS (Netherlands)

    Sanders, M.P.A.; McGuire, R; Roumen, L.; de Esch, I.J.P.; de Vlieg, J; Klomp, J.P.G; de Graaf, C.

    2011-01-01

    A pharmacophore describes the arrangement of molecular features a ligand must contain to efficaciously bind a receptor. Pharmacophore models are developed to improve molecular understanding of ligand-protein interactions, and can be used as a tool to identify novel compounds that fulfil the

  15. SA-Search: a web tool for protein structure mining based on a Structural Alphabet.

    Science.gov (United States)

    Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre

    2004-07-01

    SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of fast 3D similarity searches such as the extraction of exact words using a suffix tree approach, and the search for fuzzy words viewed as a simple 1D sequence alignment problem. SA-Search is available at http://bioserv.rpbs.jussieu.fr/cgi-bin/SA-Search.

  16. Scalable rule-based modelling of allosteric proteins and biochemical networks.

    Directory of Open Access Journals (Sweden)

    Julien F Ollivier

    2010-11-01

    Full Text Available Much of the complexity of biochemical networks comes from the information-processing abilities of allosteric proteins, be they receptors, ion-channels, signalling molecules or transcription factors. An allosteric protein can be uniquely regulated by each combination of input molecules that it binds. This "regulatory complexity" causes a combinatorial increase in the number of parameters required to fit experimental data as the number of protein interactions increases. It therefore challenges the creation, updating, and re-use of biochemical models. Here, we propose a rule-based modelling framework that exploits the intrinsic modularity of protein structure to address regulatory complexity. Rather than treating proteins as "black boxes", we model their hierarchical structure and, as conformational changes, internal dynamics. By modelling the regulation of allosteric proteins through these conformational changes, we often decrease the number of parameters required to fit data, and so reduce over-fitting and improve the predictive power of a model. Our method is thermodynamically grounded, imposes detailed balance, and also includes molecular cross-talk and the background activity of enzymes. We use our Allosteric Network Compiler to examine how allostery can facilitate macromolecular assembly and how competitive ligands can change the observed cooperativity of an allosteric protein. We also develop a parsimonious model of G protein-coupled receptors that explains functional selectivity and can predict the rank order of potency of agonists acting through a receptor. Our methodology should provide a basis for scalable, modular and executable modelling of biochemical networks in systems and synthetic biology.

  17. Energetically Unfavorable Amide Conformations for N6-Acetyllysine Side Chains in Refined Protein Structures

    Science.gov (United States)

    Genshaft, Alexander; Moser, Joe-Ann S.; D'Antonio, Edward L.; Bowman, Christine M.; Christianson, David W.

    2013-01-01

    The reversible acetylation of lysine to form N6-acetyllysine in the regulation of protein function is a hallmark of epigenetics. Acetylation of the positively charged amino group of the lysine side chain generates a neutral N-alkylacetamide moiety that serves as a molecular “switch” for the modulation of protein function and protein-protein interactions. We now report the analysis of 381 N6-acetyllysine side chain amide conformations as found in 79 protein crystal structures and 11 protein NMR structures deposited in the Protein Data Bank (PDB) of the Research Collaboratory for Structural Bioinformatics. We find that only 74.3% of N6-acetyllysine residues in protein crystal structures and 46.5% in protein NMR structures contain amide groups with energetically preferred trans or generously trans conformations. Surprisingly, 17.6% of N6-acetyllysine residues in protein crystal structures and 5.3% in protein NMR structures contain amide groups with energetically unfavorable cis or generously cis conformations. Even more surprisingly, 8.1% of N6-acetyllysine residues in protein crystal structures and 48.2% in NMR structures contain amide groups with energetically prohibitive twisted conformations that approach the transition state structure for cis-trans isomerization. In contrast, 109 unique N-alkylacetamide groups contained in 84 highly-accurate small molecule crystal structures retrieved from the Cambridge Structural Database exclusively adopt energetically preferred trans conformations. Therefore, we conclude that cis and twisted N6-acetyllysine amides in protein structures deposited in the PDB are erroneously modeled due to their energetically unfavorable or prohibitive conformations. PMID:23401043

  18. Understanding the general packing rearrangements required for successful template based modeling of protein structure from a CASP experiment.

    Science.gov (United States)

    Day, Ryan; Joo, Hyun; Chavan, Archana C; Lennox, Kristin P; Chen, Y Ann; Dahl, David B; Vannucci, Marina; Tsai, Jerry W

    2013-02-01

    As an alternative to the common template based protein structure prediction methods based on main-chain position, a novel side-chain centric approach has been developed. Together with a Bayesian loop modeling procedure and a combination scoring function, the Stone Soup algorithm was applied to the CASP9 set of template based modeling targets. Although the method did not generate as large of perturbations to the template structures as necessary, the analysis of the results gives unique insights into the differences in packing between the target structures and their templates. Considerable variation in packing is found between target and template structures even when the structures are close, and this variation is found due to 2 and 3 body packing interactions. Outside the inherent restrictions in packing representation of the PDB, the first steps in correctly defining those regions of variable packing have been mapped primarily to local interactions, as the packing at the secondary and tertiary structure are largely conserved. Of the scoring functions used, a loop scoring function based on water structure exhibited some promise for discrimination. These results present a clear structural path for further development of a side-chain centered approach to template based modeling. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Exploring overlapping functional units with various structure in protein interaction networks.

    Directory of Open Access Journals (Sweden)

    Xiao-Fei Zhang

    Full Text Available Revealing functional units in protein-protein interaction (PPI networks are important for understanding cellular functional organization. Current algorithms for identifying functional units mainly focus on cohesive protein complexes which have more internal interactions than external interactions. Most of these approaches do not handle overlaps among complexes since they usually allow a protein to belong to only one complex. Moreover, recent studies have shown that other non-cohesive structural functional units beyond complexes also exist in PPI networks. Thus previous algorithms that just focus on non-overlapping cohesive complexes are not able to present the biological reality fully. Here, we develop a new regularized sparse random graph model (RSRGM to explore overlapping and various structural functional units in PPI networks. RSRGM is principally dominated by two model parameters. One is used to define the functional units as groups of proteins that have similar patterns of connections to others, which allows RSRGM to detect non-cohesive structural functional units. The other one is used to represent the degree of proteins belonging to the units, which supports a protein belonging to more than one revealed unit. We also propose a regularizer to control the smoothness between the estimators of these two parameters. Experimental results on four S. cerevisiae PPI networks show that the performance of RSRGM on detecting cohesive complexes and overlapping complexes is superior to that of previous competing algorithms. Moreover, RSRGM has the ability to discover biological significant functional units besides complexes.

  20. Overcoming barriers to membrane protein structure determination.

    Science.gov (United States)

    Bill, Roslyn M; Henderson, Peter J F; Iwata, So; Kunji, Edmund R S; Michel, Hartmut; Neutze, Richard; Newstead, Simon; Poolman, Bert; Tate, Christopher G; Vogel, Horst

    2011-04-01

    After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new approaches to solving membrane protein structures based on recent technological advances. Rational approaches to overcoming the bottlenecks in the field are urgently required as membrane proteins, which typically comprise ~30% of the proteomes of organisms, are dramatically under-represented in the structural database of the Protein Data Bank.

  1. Refinement of homology-based protein structures by molecular dynamics simulation techniques

    NARCIS (Netherlands)

    Fan, H; Mark, AE

    The use of classical molecular dynamics simulations, performed in explicit water, for the refinement of structural models of proteins generated ab initio or based on homology has been investigated. The study involved a test set of 15 proteins that were previously used by Baker and coworkers to

  2. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

    Directory of Open Access Journals (Sweden)

    Mile Sikić

    2009-01-01

    Full Text Available Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i a combination of sequence- and structure-derived parameters and (ii sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.

  3. Structural characterization of respiratory syncytial virus fusion inhibitor escape mutants: homology model of the F protein and a syncytium formation assay

    International Nuclear Information System (INIS)

    Morton, Craig J.; Cameron, Rachel; Lawrence, Lynne J.; Lin Bo; Lowe, Melinda; Luttick, Angela; Mason, Anthony; McKimm-Breschkin, Jenny; Parker, Michael W.; Ryan, Jane; Smout, Michael; Sullivan, Jayne; Tucker, Simon P.; Young, Paul R.

    2003-01-01

    Respiratory syncytial virus (RSV) is a ubiquitous human pathogen and the leading cause of lower respiratory tract infections in infants. Infection of cells and subsequent formation of syncytia occur through membrane fusion mediated by the RSV fusion protein (RSV-F). A novel in vitro assay of recombinant RSV-F function has been devised and used to characterize a number of escape mutants for three known inhibitors of RSV-F that have been isolated. Homology modeling of the RSV-F structure has been carried out on the basis of a chimera derived from the crystal structures of the RSV-F core and a fragment from the orthologous fusion protein from Newcastle disease virus (NDV). The structure correlates well with the appearance of RSV-F in electron micrographs, and the residues identified as contributing to specific binding sites for several monoclonal antibodies are arranged in appropriate solvent-accessible clusters. The positions of the characterized resistance mutants in the model structure identify two promising regions for the design of fusion inhibitors

  4. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-01-01

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment

  5. Structural determination of intact proteins using mass spectrometry

    Science.gov (United States)

    Kruppa, Gary [San Francisco, CA; Schoeniger, Joseph S [Oakland, CA; Young, Malin M [Livermore, CA

    2008-05-06

    The present invention relates to novel methods of determining the sequence and structure of proteins. Specifically, the present invention allows for the analysis of intact proteins within a mass spectrometer. Therefore, preparatory separations need not be performed prior to introducing a protein sample into the mass spectrometer. Also disclosed herein are new instrumental developments for enhancing the signal from the desired modified proteins, methods for producing controlled protein fragments in the mass spectrometer, eliminating complex microseparations, and protein preparatory chemical steps necessary for cross-linking based protein structure determination.Additionally, the preferred method of the present invention involves the determination of protein structures utilizing a top-down analysis of protein structures to search for covalent modifications. In the preferred method, intact proteins are ionized and fragmented within the mass spectrometer.

  6. Improved hybrid optimization algorithm for 3D protein structure prediction.

    Science.gov (United States)

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.

  7. BCL::MP-Fold: membrane protein structure prediction guided by EPR restraints

    Science.gov (United States)

    Fischer, Axel W.; Alexander, Nathan S.; Woetzel, Nils; Karakaş, Mert; Weiner, Brian E.; Meiler, Jens

    2016-01-01

    For many membrane proteins, the determination of their topology remains a challenge for methods like X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. Electron paramagnetic resonance (EPR) spectroscopy has evolved as an alternative technique to study structure and dynamics of membrane proteins. The present study demonstrates the feasibility of membrane protein topology determination using limited EPR distance and accessibility measurements. The BCL::MP-Fold algorithm assembles secondary structure elements (SSEs) in the membrane using a Monte Carlo Metropolis (MCM) approach. Sampled models are evaluated using knowledge-based potential functions and agreement with the EPR data and a knowledge-based energy function. Twenty-nine membrane proteins of up to 696 residues are used to test the algorithm. The protein-size-normalized root-mean-square-deviation (RMSD100) value of the most accurate model is better than 8 Å for twenty-seven, better than 6 Å for twenty-two, and better than 4 Å for fifteen out of twenty-nine proteins, demonstrating the algorithm’s ability to sample the native topology. The average enrichment could be improved from 1.3 to 2.5, showing the improved discrimination power by using EPR data. PMID:25820805

  8. Structural basis for target protein recognition by the protein disulfide reductase thioredoxin

    DEFF Research Database (Denmark)

    Maeda, Kenji; Hägglund, Per; Finnie, Christine

    2006-01-01

    Thioredoxin is ubiquitous and regulates various target proteins through disulfide bond reduction. We report the structure of thioredoxin (HvTrxh2 from barley) in a reaction intermediate complex with a protein substrate, barley alpha-amylase/subtilisin inhibitor (BASI). The crystal structure...... of this mixed disulfide shows a conserved hydrophobic motif in thioredoxin interacting with a sequence of residues from BASI through van der Waals contacts and backbone-backbone hydrogen bonds. The observed structural complementarity suggests that the recognition of features around protein disulfides plays...... a major role in the specificity and protein disulfide reductase activity of thioredoxin. This novel insight into the function of thioredoxin constitutes a basis for comprehensive understanding of its biological role. Moreover, comparison with structurally related proteins shows that thioredoxin shares...

  9. Structural studies of the Enterococcus faecalis SufU [Fe-S] cluster protein

    Directory of Open Access Journals (Sweden)

    Frazzon Jeverson

    2009-02-01

    Full Text Available Abstract Background Iron-sulfur clusters are ubiquitous and evolutionarily ancient inorganic prosthetic groups, the biosynthesis of which depends on complex protein machineries. Three distinct assembly systems involved in the maturation of cellular Fe-S proteins have been determined, designated the NIF, ISC and SUF systems. Although well described in several organisms, these machineries are poorly understood in Gram-positive bacteria. Within the Firmicutes phylum, the Enterococcus spp. genus have recently assumed importance in clinical microbiology being considered as emerging pathogens for humans, wherein Enterococcus faecalis represents the major species associated with nosocomial infections. The aim of this study was to carry out a phylogenetic analysis in Enterococcus faecalis V583 and a structural and conformational characterisation of it SufU protein. Results BLAST searches of the Enterococcus genome revealed a series of genes with sequence similarity to the Escherichia coli SUF machinery of [Fe-S] cluster biosynthesis, namely sufB, sufC, sufD and SufS. In addition, the E. coli IscU ortholog SufU was found to be the scaffold protein of Enterococcus spp., containing all features considered essential for its biological activity, including conserved amino acid residues involved in substrate and/or co-factor binding (Cys50,76,138 and Asp52 and, phylogenetic analyses showed a close relationship with orthologues from other Gram-positive bacteria. Molecular dynamics for structural determinations and molecular modeling using E. faecalis SufU primary sequence protein over the PDB:1su0 crystallographic model from Streptococcus pyogenes were carried out with a subsequent 50 ns molecular dynamic trajectory. This presented a stable model, showing secondary structure modifications near the active site and conserved cysteine residues. Molecular modeling using Haemophilus influenzae IscU primary sequence over the PDB:1su0 crystal followed by a MD

  10. Protein structure: geometry, topology and classification

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, William R.; May, Alex C.W.; Brown, Nigel P.; Aszodi, Andras [Division of Mathematical Biology, National Institute for Medical Research, London (United Kingdom)

    2001-04-01

    The structural principals of proteins are reviewed and analysed from a geometric perspective with a view to revealing the underlying regularities in their construction. Computer methods for the automatic comparison and classification of these structures are then reviewed with an analysis of the statistical significance of comparing different shapes. Following an analysis of the current state of the classification of proteins, more abstract geometric and topological representations are explored, including the occurrence of knotted topologies. The review concludes with a consideration of the origin of higher-level symmetries in protein structure. (author)

  11. Use of designed sequences in protein structure recognition.

    Science.gov (United States)

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  12. PDB2CD visualises dynamics within protein structures.

    Science.gov (United States)

    Janes, Robert W

    2017-10-01

    Proteins tend to have defined conformations, a key factor in enabling their function. Atomic resolution structures of proteins are predominantly obtained by either solution nuclear magnetic resonance (NMR) or crystal structure methods. However, when considering a protein whose structure has been determined by both these approaches, on many occasions, the resultant conformations are subtly different, as illustrated by the examples in this study. The solution NMR approach invariably results in a cluster of structures whose conformations satisfy the distance boundaries imposed by the data collected; it might be argued that this is evidence of the dynamics of proteins when in solution. In crystal structures, the proteins are often in an energy minimum state which can result in an increase in the extent of regular secondary structure present relative to the solution state depicted by NMR, because the more dynamic ends of alpha helices and beta strands can become ordered at the lower temperatures. This study examines a novel way to display the differences in conformations within an NMR ensemble and between these and a crystal structure of a protein. Circular dichroism (CD) spectroscopy can be used to characterise protein structures in solution. Using the new bioinformatics tool, PDB2CD, which generates CD spectra from atomic resolution protein structures, the differences between, and possible dynamic range of, conformations adopted by a protein can be visualised.

  13. Course 12: Proteins: Structural, Thermodynamic and Kinetic Aspects

    Science.gov (United States)

    Finkelstein, A. V.

    1 Introduction 2 Overview of protein architectures and discussion of physical background of their natural selection 2.1 Protein structures 2.2 Physical selection of protein structures 3 Thermodynamic aspects of protein folding 3.1 Reversible denaturation of protein structures 3.2 What do denatured proteins look like? 3.3 Why denaturation of a globular protein is the first-order phase transition 3.4 "Gap" in energy spectrum: The main characteristic that distinguishes protein chains from random polymers 4 Kinetic aspects of protein folding 4.1 Protein folding in vivo 4.2 Protein folding in vitro (in the test-tube) 4.3 Theory of protein folding rates and solution of the Levinthal paradox

  14. Automatic protein structure solution from weak X-ray data

    Science.gov (United States)

    Skubák, Pavol; Pannu, Navraj S.

    2013-11-01

    Determining new protein structures from X-ray diffraction data at low resolution or with a weak anomalous signal is a difficult and often an impossible task. Here we propose a multivariate algorithm that simultaneously combines the structure determination steps. In tests on over 140 real data sets from the protein data bank, we show that this combined approach can automatically build models where current algorithms fail, including an anisotropically diffracting 3.88 Å RNA polymerase II data set. The method seamlessly automates the process, is ideal for non-specialists and provides a mathematical framework for successfully combining various sources of information in image processing.

  15. Evaluation of multiple protein docking structures using correctly predicted pairwise subunits

    Directory of Open Access Journals (Sweden)

    Esquivel-Rodríguez Juan

    2012-03-01

    Full Text Available Abstract Background Many functionally important proteins in a cell form complexes with multiple chains. Therefore, computational prediction of multiple protein complexes is an important task in bioinformatics. In the development of multiple protein docking methods, it is important to establish a metric for evaluating prediction results in a reasonable and practical fashion. However, since there are only few works done in developing methods for multiple protein docking, there is no study that investigates how accurate structural models of multiple protein complexes should be to allow scientists to gain biological insights. Methods We generated a series of predicted models (decoys of various accuracies by our multiple protein docking pipeline, Multi-LZerD, for three multi-chain complexes with 3, 4, and 6 chains. We analyzed the decoys in terms of the number of correctly predicted pair conformations in the decoys. Results and conclusion We found that pairs of chains with the correct mutual orientation exist even in the decoys with a large overall root mean square deviation (RMSD to the native. Therefore, in addition to a global structure similarity measure, such as the global RMSD, the quality of models for multiple chain complexes can be better evaluated by using the local measurement, the number of chain pairs with correct mutual orientation. We termed the fraction of correctly predicted pairs (RMSD at the interface of less than 4.0Å as fpair and propose to use it for evaluation of the accuracy of multiple protein docking.

  16. Structural test of the parameterized-backbone method for protein design.

    Science.gov (United States)

    Plecs, Joseph J; Harbury, Pehr B; Kim, Peter S; Alber, Tom

    2004-09-03

    Designing new protein folds requires a method for simultaneously optimizing the conformation of the backbone and the side-chains. One approach to this problem is the use of a parameterized backbone, which allows the systematic exploration of families of structures. We report the crystal structure of RH3, a right-handed, three-helix coiled coil that was designed using a parameterized backbone and detailed modeling of core packing. This crystal structure was determined using another rationally designed feature, a metal-binding site that permitted experimental phasing of the X-ray data. RH3 adopted the intended fold, which has not been observed previously in biological proteins. Unanticipated structural asymmetry in the trimer was a principal source of variation within the RH3 structure. The sequence of RH3 differs from that of a previously characterized right-handed tetramer, RH4, at only one position in each 11 amino acid sequence repeat. This close similarity indicates that the design method is sensitive to the core packing interactions that specify the protein structure. Comparison of the structures of RH3 and RH4 indicates that both steric overlap and cavity formation provide strong driving forces for oligomer specificity.

  17. Structure of a two-CAP-domain protein from the human hookworm parasite Necator americanus

    Energy Technology Data Exchange (ETDEWEB)

    Asojo, Oluwatoyin A., E-mail: oasojo@unmc.edu [Pathology and Microbiology Department, 986495 Nebraska Medical Center, Omaha, NE 68198-6495 (United States)

    2011-05-01

    The first structure of a two-CAP-domain protein, Na-ASP-1, from the major human hookworm parasite N. americanus refined to a resolution limit of 2.2 Å is presented. Major proteins secreted by the infective larval stage hookworms upon host entry include Ancylostoma secreted proteins (ASPs), which are characterized by one or two CAP (cysteine-rich secretory protein/antigen 5/pathogenesis related-1) domains. The CAP domain has been reported in diverse phylogenetically unrelated proteins, but has no confirmed function. The first structure of a two-CAP-domain protein, Na-ASP-1, from the major human hookworm parasite Necator americanus was refined to a resolution limit of 2.2 Å. The structure was solved by molecular replacement (MR) using Na-ASP-2, a one-CAP-domain ASP, as the search model. The correct MR solution could only be obtained by truncating the polyalanine model of Na-ASP-2 and removing several loops. The structure reveals two CAP domains linked by an extended loop. Overall, the carboxyl-terminal CAP domain is more similar to Na-ASP-2 than to the amino-terminal CAP domain. A large central cavity extends from the amino-terminal CAP domain to the carboxyl-terminal CAP domain, encompassing the putative CAP-binding cavity. The putative CAP-binding cavity is a characteristic cavity in the carboxyl-terminal CAP domain that contains a His and Glu pair. These residues are conserved in all single-CAP-domain proteins, but are absent in the amino-terminal CAP domain. The conserved His residues are oriented such that they appear to be capable of directly coordinating a zinc ion as observed for CAP proteins from reptile venoms. This first structure of a two-CAP-domain ASP can serve as a template for homology modeling of other two-CAP-domain proteins.

  18. Structure of a two-CAP-domain protein from the human hookworm parasite Necator americanus

    International Nuclear Information System (INIS)

    Asojo, Oluwatoyin A.

    2011-01-01

    The first structure of a two-CAP-domain protein, Na-ASP-1, from the major human hookworm parasite N. americanus refined to a resolution limit of 2.2 Å is presented. Major proteins secreted by the infective larval stage hookworms upon host entry include Ancylostoma secreted proteins (ASPs), which are characterized by one or two CAP (cysteine-rich secretory protein/antigen 5/pathogenesis related-1) domains. The CAP domain has been reported in diverse phylogenetically unrelated proteins, but has no confirmed function. The first structure of a two-CAP-domain protein, Na-ASP-1, from the major human hookworm parasite Necator americanus was refined to a resolution limit of 2.2 Å. The structure was solved by molecular replacement (MR) using Na-ASP-2, a one-CAP-domain ASP, as the search model. The correct MR solution could only be obtained by truncating the polyalanine model of Na-ASP-2 and removing several loops. The structure reveals two CAP domains linked by an extended loop. Overall, the carboxyl-terminal CAP domain is more similar to Na-ASP-2 than to the amino-terminal CAP domain. A large central cavity extends from the amino-terminal CAP domain to the carboxyl-terminal CAP domain, encompassing the putative CAP-binding cavity. The putative CAP-binding cavity is a characteristic cavity in the carboxyl-terminal CAP domain that contains a His and Glu pair. These residues are conserved in all single-CAP-domain proteins, but are absent in the amino-terminal CAP domain. The conserved His residues are oriented such that they appear to be capable of directly coordinating a zinc ion as observed for CAP proteins from reptile venoms. This first structure of a two-CAP-domain ASP can serve as a template for homology modeling of other two-CAP-domain proteins

  19. Simulation of Protein Structure, Dynamics and Function in Organic Media

    National Research Council Canada - National Science Library

    Daggett, Valerie

    1998-01-01

    The overall goal of our ONR-sponsored research is to pursue realistic molecular modeling strudies pertinnent to the related properties of protein stability, dynamics, structure, function, and folding in aqueous solution...

  20. Modelling Protein Dynamics on the Microsecond Time Scale

    DEFF Research Database (Denmark)

    Siuda, Iwona Anna

    Recent years have shown an increase in coarse-grained (CG) molecular dynamics simulations, providing structural and dynamic details of large proteins and enabling studies of self-assembly of biological materials. It is not easy to acquire such data experimentally, and access is also still limited...... in atomistic simulations. During her PhD studies, Iwona Siuda used MARTINI CG models to study the dynamics of different globular and membrane proteins. In several cases, the MARTINI model was sufficient to study conformational changes of small, purely alpha-helical proteins. However, in studies of larger......ELNEDIN was therefore proposed as part of the work. Iwona Siuda’s results from the CG simulations had biological implications that provide insights into possible mechanisms of the periplasmic leucine-binding protein, the sarco(endo)plasmic reticulum calcium pump, and several proteins from the saposin-like proteins...

  1. Protein interfacial structure and nanotoxicology

    Energy Technology Data Exchange (ETDEWEB)

    White, John W. [Research School of Chemistry, Australian National University, Canberra (Australia)], E-mail: jww@rsc.anu.edu.au; Perriman, Adam W.; McGillivray, Duncan J.; Lin, J.-M. [Research School of Chemistry, Australian National University, Canberra (Australia)

    2009-02-21

    Here we briefly recapitulate the use of X-ray and neutron reflectometry at the air-water interface to find protein structures and thermodynamics at interfaces and test a possibility for understanding those interactions between nanoparticles and proteins which lead to nanoparticle toxicology through entry into living cells. Stable monomolecular protein films have been made at the air-water interface and, with a specially designed vessel, the substrate changed from that which the air-water interfacial film was deposited. This procedure allows interactions, both chemical and physical, between introduced species and the monomolecular film to be studied by reflectometry. The method is briefly illustrated here with some new results on protein-protein interaction between {beta}-casein and {kappa}-casein at the air-water interface using X-rays. These two proteins are an essential component of the structure of milk. In the experiments reported, specific and directional interactions appear to cause different interfacial structures if first, a {beta}-casein monolayer is attacked by a {kappa}-casein solution compared to the reverse. The additional contrast associated with neutrons will be an advantage here. We then show the first results of experiments on the interaction of a {beta}-casein monolayer with a nanoparticle titanium oxide sol, foreshadowing the study of the nanoparticle 'corona' thought to be important for nanoparticle-cell wall penetration.

  2. Protein interfacial structure and nanotoxicology

    International Nuclear Information System (INIS)

    White, John W.; Perriman, Adam W.; McGillivray, Duncan J.; Lin, J.-M.

    2009-01-01

    Here we briefly recapitulate the use of X-ray and neutron reflectometry at the air-water interface to find protein structures and thermodynamics at interfaces and test a possibility for understanding those interactions between nanoparticles and proteins which lead to nanoparticle toxicology through entry into living cells. Stable monomolecular protein films have been made at the air-water interface and, with a specially designed vessel, the substrate changed from that which the air-water interfacial film was deposited. This procedure allows interactions, both chemical and physical, between introduced species and the monomolecular film to be studied by reflectometry. The method is briefly illustrated here with some new results on protein-protein interaction between β-casein and κ-casein at the air-water interface using X-rays. These two proteins are an essential component of the structure of milk. In the experiments reported, specific and directional interactions appear to cause different interfacial structures if first, a β-casein monolayer is attacked by a κ-casein solution compared to the reverse. The additional contrast associated with neutrons will be an advantage here. We then show the first results of experiments on the interaction of a β-casein monolayer with a nanoparticle titanium oxide sol, foreshadowing the study of the nanoparticle 'corona' thought to be important for nanoparticle-cell wall penetration.

  3. Quantitative chemogenomics: machine-learning models of protein-ligand interaction.

    Science.gov (United States)

    Andersson, Claes R; Gustafsson, Mats G; Strömbergsson, Helena

    2011-01-01

    Chemogenomics is an emerging interdisciplinary field that lies in the interface of biology, chemistry, and informatics. Most of the currently used drugs are small molecules that interact with proteins. Understanding protein-ligand interaction is therefore central to drug discovery and design. In the subfield of chemogenomics known as proteochemometrics, protein-ligand-interaction models are induced from data matrices that consist of both protein and ligand information along with some experimentally measured variable. The two general aims of this quantitative multi-structure-property-relationship modeling (QMSPR) approach are to exploit sparse/incomplete information sources and to obtain more general models covering larger parts of the protein-ligand space, than traditional approaches that focuses mainly on specific targets or ligands. The data matrices, usually obtained from multiple sparse/incomplete sources, typically contain series of proteins and ligands together with quantitative information about their interactions. A useful model should ideally be easy to interpret and generalize well to new unseen protein-ligand combinations. Resolving this requires sophisticated machine-learning methods for model induction, combined with adequate validation. This review is intended to provide a guide to methods and data sources suitable for this kind of protein-ligand-interaction modeling. An overview of the modeling process is presented including data collection, protein and ligand descriptor computation, data preprocessing, machine-learning-model induction and validation. Concerns and issues specific for each step in this kind of data-driven modeling will be discussed. © 2011 Bentham Science Publishers

  4. PSAIA – Protein Structure and Interaction Analyzer

    Directory of Open Access Journals (Sweden)

    Vlahoviček Kristian

    2008-04-01

    Full Text Available Abstract Background PSAIA (Protein Structure and Interaction Analyzer was developed to compute geometric parameters for large sets of protein structures in order to predict and investigate protein-protein interaction sites. Results In addition to most relevant established algorithms, PSAIA offers a new method PIADA (Protein Interaction Atom Distance Algorithm for the determination of residue interaction pairs. We found that PIADA produced more satisfactory results than comparable algorithms implemented in PSAIA. Particular advantages of PSAIA include its capacity to combine different methods to detect the locations and types of interactions between residues and its ability, without any further automation steps, to handle large numbers of protein structures and complexes. Generally, the integration of a variety of methods enables PSAIA to offer easier automation of analysis and greater reliability of results. PSAIA can be used either via a graphical user interface or from the command-line. Results are generated in either tabular or XML format. Conclusion In a straightforward fashion and for large sets of protein structures, PSAIA enables the calculation of protein geometric parameters and the determination of location and type for protein-protein interaction sites. XML formatted output enables easy conversion of results to various formats suitable for statistic analysis. Results from smaller data sets demonstrated the influence of geometry on protein interaction sites. Comprehensive analysis of properties of large data sets lead to new information useful in the prediction of protein-protein interaction sites.

  5. The structure of a cholesterol-trapping protein

    Science.gov (United States)

    cholesterol-trapping protein Contact: Dan Krotz, dakrotz@lbl.gov Berkeley Lab Science Beat Lab website index Institute researchers determined the three-dimensional structure of a protein that controls cholesterol level in the bloodstream. Knowing the structure of the protein, a cellular receptor that ensnares

  6. Modulating nanoparticle superlattice structure using proteins with tunable bond distributions

    International Nuclear Information System (INIS)

    McMillan, Janet R.; Brodin, Jeffrey D.; Millan, Jaime A.; Lee, Byeongdu; Olvera de la Cruz, Monica; Mirkin, Chad A.

    2017-01-01

    Here, we investigate the use of proteins with tunable DNA modification distributions to modulate nanoparticle superlattice structure. Using Beta-galactosidase (βgal) as a model system, we have employed the orthogonal chemical reactivities of surface amines and thiols to synthesize protein-DNA conjugates with 36 evenly distributed or 8 specifically positioned oligonucleotides. When assembled into crystalline superlattices with AuNPs, we find that the distribution of DNA modifications modulates the favored structure: βgal with uniformly distributed DNA bonding elements results in body-centered cubic crystals, whereas DNA functionalization of cysteines results in AB 2 packing. We probe the role of protein oligonucleotide number and conjugate size on this observation, which revealed the importance of oligonucleotide distribution and number in this observed assembly behavior. These results indicate that proteins with defined DNA-modification patterns are powerful tools to control the nanoparticle superlattices architecture, and establish the importance of oligonucleotide distribution in the assembly behavior of protein-DNA conjugates.

  7. Algorithm for selection of optimized EPR distance restraints for de novo protein structure determination

    Science.gov (United States)

    Kazmier, Kelli; Alexander, Nathan S.; Meiler, Jens; Mchaourab, Hassane S.

    2010-01-01

    A hybrid protein structure determination approach combining sparse Electron Paramagnetic Resonance (EPR) distance restraints and Rosetta de novo protein folding has been previously demonstrated to yield high quality models (Alexander et al., 2008). However, widespread application of this methodology to proteins of unknown structures is hindered by the lack of a general strategy to place spin label pairs in the primary sequence. In this work, we report the development of an algorithm that optimally selects spin labeling positions for the purpose of distance measurements by EPR. For the α-helical subdomain of T4 lysozyme (T4L), simulated restraints that maximize sequence separation between the two spin labels while simultaneously ensuring pairwise connectivity of secondary structure elements yielded vastly improved models by Rosetta folding. 50% of all these models have the correct fold compared to only 21% and 8% correctly folded models when randomly placed restraints or no restraints are used, respectively. Moreover, the improvements in model quality require a limited number of optimized restraints, the number of which is determined by the pairwise connectivities of T4L α-helices. The predicted improvement in Rosetta model quality was verified by experimental determination of distances between spin labels pairs selected by the algorithm. Overall, our results reinforce the rationale for the combined use of sparse EPR distance restraints and de novo folding. By alleviating the experimental bottleneck associated with restraint selection, this algorithm sets the stage for extending computational structure determination to larger, traditionally elusive protein topologies of critical structural and biochemical importance. PMID:21074624

  8. Restricted N-glycan conformational space in the PDB and its implication in glycan structure modeling.

    Science.gov (United States)

    Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil

    2013-01-01

    Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures.

  9. Restricted N-glycan conformational space in the PDB and its implication in glycan structure modeling.

    Directory of Open Access Journals (Sweden)

    Sunhwan Jo

    Full Text Available Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures.

  10. Function and structure of GFP-like proteins in the protein data bank.

    Science.gov (United States)

    Ong, Wayne J-H; Alvarez, Samuel; Leroux, Ivan E; Shahid, Ramza S; Samma, Alex A; Peshkepija, Paola; Morgan, Alicia L; Mulcahy, Shawn; Zimmer, Marc

    2011-04-01

    The RCSB protein databank contains 266 crystal structures of green fluorescent proteins (GFP) and GFP-like proteins. This is the first systematic analysis of all the GFP-like structures in the pdb. We have used the pdb to examine the function of fluorescent proteins (FP) in nature, aspects of excited state proton transfer (ESPT) in FPs, deformation from planarity of the chromophore and chromophore maturation. The conclusions reached in this review are that (1) The lid residues are highly conserved, particularly those on the "top" of the β-barrel. They are important to the function of GFP-like proteins, perhaps in protecting the chromophore or in β-barrel formation. (2) The primary/ancestral function of GFP-like proteins may well be to aid in light induced electron transfer. (3) The structural prerequisites for light activated proton pumps exist in many structures and it's possible that like bioluminescence, proton pumps are secondary functions of GFP-like proteins. (4) In most GFP-like proteins the protein matrix exerts a significant strain on planar chromophores forcing most GFP-like proteins to adopt non-planar chromophores. These chromophoric deviations from planarity play an important role in determining the fluorescence quantum yield. (5) The chemospatial characteristics of the chromophore cavity determine the isomerization state of the chromophore. The cavities of highlighter proteins that can undergo cis/trans isomerization have chemospatial properties that are common to both cis and trans GFP-like proteins.

  11. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  12. Functional classification of protein structures by local structure matching in graph representation.

    Science.gov (United States)

    Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo

    2018-03-31

    As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  13. Proteins with Novel Structure, Function and Dynamics

    Science.gov (United States)

    Pohorille, Andrew

    2014-01-01

    Recently, a small enzyme that ligates two RNA fragments with the rate of 10(exp 6) above background was evolved in vitro (Seelig and Szostak, Nature 448:828-831, 2007). This enzyme does not resemble any contemporary protein (Chao et al., Nature Chem. Biol. 9:81-83, 2013). It consists of a dynamic, catalytic loop, a small, rigid core containing two zinc ions coordinated by neighboring amino acids, and two highly flexible tails that might be unimportant for protein function. In contrast to other proteins, this enzyme does not contain ordered secondary structure elements, such as alpha-helix or beta-sheet. The loop is kept together by just two interactions of a charged residue and a histidine with a zinc ion, which they coordinate on the opposite side of the loop. Such structure appears to be very fragile. Surprisingly, computer simulations indicate otherwise. As the coordinating, charged residue is mutated to alanine, another, nearby charged residue takes its place, thus keeping the structure nearly intact. If this residue is also substituted by alanine a salt bridge involving two other, charged residues on the opposite sides of the loop keeps the loop in place. These adjustments are facilitated by high flexibility of the protein. Computational predictions have been confirmed experimentally, as both mutants retain full activity and overall structure. These results challenge our notions about what is required for protein activity and about the relationship between protein dynamics, stability and robustness. We hypothesize that small, highly dynamic proteins could be both active and fault tolerant in ways that many other proteins are not, i.e. they can adjust to retain their structure and activity even if subjected to mutations in structurally critical regions. This opens the doors for designing proteins with novel functions, structures and dynamics that have not been yet considered.

  14. Supplementary Material for: Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody

    2016-01-01

    Abstract Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. Methods To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. Results The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites. Conclusions Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel

  15. Structure and function of nanoparticle-protein conjugates

    International Nuclear Information System (INIS)

    Aubin-Tam, M-E; Hamad-Schifferli, K

    2008-01-01

    Conjugation of proteins to nanoparticles has numerous applications in sensing, imaging, delivery, catalysis, therapy and control of protein structure and activity. Therefore, characterizing the nanoparticle-protein interface is of great importance. A variety of covalent and non-covalent linking chemistries have been reported for nanoparticle attachment. Site-specific labeling is desirable in order to control the protein orientation on the nanoparticle, which is crucial in many applications such as fluorescence resonance energy transfer. We evaluate methods for successful site-specific attachment. Typically, a specific protein residue is linked directly to the nanoparticle core or to the ligand. As conjugation often affects the protein structure and function, techniques to probe structure and activity are assessed. We also examine how molecular dynamics simulations of conjugates would complete those experimental techniques in order to provide atomistic details on the effect of nanoparticle attachment. Characterization studies of nanoparticle-protein complexes show that the structure and function are influenced by the chemistry of the nanoparticle ligand, the nanoparticle size, the nanoparticle material, the stoichiometry of the conjugates, the labeling site on the protein and the nature of the linkage (covalent versus non-covalent)

  16. Discovery of Novel Inhibitors for Nek6 Protein through Homology Model Assisted Structure Based Virtual Screening and Molecular Docking Approaches

    Directory of Open Access Journals (Sweden)

    P. Srinivasan

    2014-01-01

    Full Text Available Nek6 is a member of the NIMA (never in mitosis, gene A-related serine/threonine kinase family that plays an important role in the initiation of mitotic cell cycle progression. This work is an attempt to emphasize the structural and functional relationship of Nek6 protein based on homology modeling and binding pocket analysis. The three-dimensional structure of Nek6 was constructed by molecular modeling studies and the best model was further assessed by PROCHECK, ProSA, and ERRAT plot in order to analyze the quality and consistency of generated model. The overall quality of computed model showed 87.4% amino acid residues under the favored region. A 3 ns molecular dynamics simulation confirmed that the structure was reliable and stable. Two lead compounds (Binding database ID: 15666, 18602 were retrieved through structure-based virtual screening and induced fit docking approaches as novel Nek6 inhibitors. Hence, we concluded that the potential compounds may act as new leads for Nek6 inhibitors designing.

  17. Feature-Based and String-Based Models for Predicting RNA-Protein Interaction

    Directory of Open Access Journals (Sweden)

    Donald Adjeroh

    2018-03-01

    Full Text Available In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI. In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences, and structure information (protein and RNA secondary structures. This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods.

  18. Interaction of sucralose with whey protein: Experimental and molecular modeling studies

    Science.gov (United States)

    Zhang, Hongmei; Sun, Shixin; Wang, Yanqing; Cao, Jian

    2017-12-01

    The objective of this research was to study the interactions of sucralose with whey protein isolate (WPI) by using the three-dimensional fluorescence spectroscopy, circular dichroism spectroscopy and molecular modeling. The results showed that the peptide strands structure of WPI had been changed by sucralose. Sucralose binding induced the secondary structural changes and increased content of aperiodic structure of WPI. Sucralose decreased the thermal stability of WPI and acted as a structure destabilizer during the thermal unfolding process of protein. In addition, the existence of sucralose decreased the reversibility of the unfolding of WPI. Nonetheless, sucralose-WPI complex was less stable than protein alone. The molecular modeling result showed that van der Waals and hydrogen bonding interactions contribute to the complexation free binding energy. There are more than one possible binding sites of WPI with sucralose by surface binding mode.

  19. Structural classification of proteins using texture descriptors extracted from the cellular automata image.

    Science.gov (United States)

    Kavianpour, Hamidreza; Vasighi, Mahdi

    2017-02-01

    Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.

  20. Plant lessons: exploring ABCB functionality through structural modeling

    Directory of Open Access Journals (Sweden)

    Aurélien eBailly

    2012-01-01

    Full Text Available In contrast to mammalian ABCB1 proteins, narrow substrate specificity has been extensively documented for plant orthologs shown to catalyze the transport of the plant hormone, auxin. Using the crystal structures of the multidrug exporters Sav1866 and MmABCB1 as templates, we have developed structural models of plant ABCB proteins with a common architecture. Comparisons of these structures identified kingdom-specific candidate substrate-binding regions within the translocation chamber formed by the transmembrane domains of ABCBs from the model plant Arabidopsis. These results suggest an early evolutionary divergence of plant and mammalian ABCBs. Validation of these models becomes a priority for efforts to elucidate ABCB function and manipulate this class of transporters to enhance plant productivity and quality.

  1. Improving the accuracy of protein secondary structure prediction using structural alignment

    Directory of Open Access Journals (Sweden)

    Gallin Warren J

    2006-06-01

    Full Text Available Abstract Background The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3 of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences, the probability of a newly identified sequence having a structural homologue is actually quite high. Results We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25% onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics indicate that this new method can achieve a Q3 score approaching 88%. Conclusion By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs

  2. NCACO-score: An effective main-chain dependent scoring function for structure modeling

    Directory of Open Access Journals (Sweden)

    Dong Xiaoxi

    2011-05-01

    Full Text Available Abstract Background Development of effective scoring functions is a critical component to the success of protein structure modeling. Previously, many efforts have been dedicated to the development of scoring functions. Despite these efforts, development of an effective scoring function that can achieve both good accuracy and fast speed still presents a grand challenge. Results Based on a coarse-grained representation of a protein structure by using only four main-chain atoms: N, Cα, C and O, we develop a knowledge-based scoring function, called NCACO-score, that integrates different structural information to rapidly model protein structure from sequence. In testing on the Decoys'R'Us sets, we found that NCACO-score can effectively recognize native conformers from their decoys. Furthermore, we demonstrate that NCACO-score can effectively guide fragment assembly for protein structure prediction, which has achieved a good performance in building the structure models for hard targets from CASP8 in terms of both accuracy and speed. Conclusions Although NCACO-score is developed based on a coarse-grained model, it is able to discriminate native conformers from decoy conformers with high accuracy. NCACO is a very effective scoring function for structure modeling.

  3. Correlation between protein secondary structure, backbone bond angles, and side-chain orientations

    Science.gov (United States)

    Lundgren, Martin; Niemi, Antti J.

    2012-08-01

    We investigate the fine structure of the sp3 hybridized covalent bond geometry that governs the tetrahedral architecture around the central Cα carbon of a protein backbone, and for this we develop new visualization techniques to analyze high-resolution x-ray structures in the Protein Data Bank. We observe that there is a correlation between the deformations of the ideal tetrahedral symmetry and the local secondary structure of the protein. We propose a universal coarse-grained energy function to describe the ensuing side-chain geometry in terms of the Cβ carbon orientations. The energy function can model the side-chain geometry with a subatomic precision. As an example we construct the Cα-Cβ structure of HP35 chicken villin headpiece. We obtain a configuration that deviates less than 0.4 Å in root-mean-square distance from the experimental x-ray structure.

  4. Small-angle X-Ray analysis of macromolecular structure: the structure of protein NS2 (NEP) in solution

    Science.gov (United States)

    Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.

    2017-11-01

    A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.

  5. Towards structural models of molecular recognition in olfactory receptors.

    Science.gov (United States)

    Afshar, M; Hubbard, R E; Demaille, J

    1998-02-01

    The G protein coupled receptors (GPCR) are an important class of proteins that act as signal transducers through the cytoplasmic membrane. Understanding the structure and activation mechanism of these proteins is crucial for understanding many different aspects of cellular signalling. The olfactory receptors correspond to the largest family of GPCRs. Very little is known about how the structures of the receptors govern the specificity of interaction which enables identification of particular odorant molecules. In this paper, we review recent developments in two areas of molecular modelling: methods for modelling the configuration of trans-membrane helices and methods for automatic docking of ligands into receptor structures. We then show how a subset of these methods can be combined to construct a model of a rat odorant receptor interacting with lyral for which experimental data are available. This modelling can help us make progress towards elucidating the specificity of interactions between receptors and odorant molecules.

  6. Global optimization of proteins using a dynamical lattice model: Ground states and energy landscapes

    OpenAIRE

    Dressel, F.; Kobe, S.

    2004-01-01

    A simple approach is proposed to investigate the protein structure. Using a low complexity model, a simple pairwise interaction and the concept of global optimization, we are able to calculate ground states of proteins, which are in agreement with experimental data. All possible model structures of small proteins are available below a certain energy threshold. The exact lowenergy landscapes for the trp cage protein (1L2Y) is presented showing the connectivity of all states and energy barriers.

  7. Exploring the universe of protein structures beyond the Protein Data Bank.

    Science.gov (United States)

    Cossio, Pilar; Trovato, Antonio; Pietrucci, Fabio; Seno, Flavio; Maritan, Amos; Laio, Alessandro

    2010-11-04

    It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds.

  8. Exploration of freely available web-interfaces for comparative homology modelling of microbial proteins.

    Science.gov (United States)

    Nema, Vijay; Pal, Sudhir Kumar

    2013-01-01

    This study was conducted to find the best suited freely available software for modelling of proteins by taking a few sample proteins. The proteins used were small to big in size with available crystal structures for the purpose of benchmarking. Key players like Phyre2, Swiss-Model, CPHmodels-3.0, Homer, (PS)2, (PS)(2)-V(2), Modweb were used for the comparison and model generation. Benchmarking process was done for four proteins, Icl, InhA, and KatG of Mycobacterium tuberculosis and RpoB of Thermus Thermophilus to get the most suited software. Parameters compared during analysis gave relatively better values for Phyre2 and Swiss-Model. This comparative study gave the information that Phyre2 and Swiss-Model make good models of small and large proteins as compared to other screened software. Other software was also good but is often not very efficient in providing full-length and properly folded structure.

  9. Solution NMR structure determination of proteins revisited

    International Nuclear Information System (INIS)

    Billeter, Martin; Wagner, Gerhard; Wuethrich, Kurt

    2008-01-01

    This 'Perspective' bears on the present state of protein structure determination by NMR in solution. The focus is on a comparison of the infrastructure available for NMR structure determination when compared to protein crystal structure determination by X-ray diffraction. The main conclusion emerges that the unique potential of NMR to generate high resolution data also on dynamics, interactions and conformational equilibria has contributed to a lack of standard procedures for structure determination which would be readily amenable to improved efficiency by automation. To spark renewed discussion on the topic of NMR structure determination of proteins, procedural steps with high potential for improvement are identified

  10. K-nearest uphill clustering in the protein structure space

    KAUST Repository

    Cui, Xuefeng; Gao, Xin

    2016-01-01

    The protein structure classification problem, which is to assign a protein structure to a cluster of similar proteins, is one of the most fundamental problems in the construction and application of the protein structure space. Early manually curated

  11. Roles of water in protein structure and function studied by molecular liquid theory.

    Science.gov (United States)

    Imai, Takashi

    2009-01-01

    The roles of water in the structure and function of proteins have not been completely elucidated. Although molecular simulation has been widely used for the investigation of protein structure and function, it is not always useful for elucidating the roles of water because the effect of water ranges from atomic to thermodynamic level. The three-dimensional reference interaction site model (3D-RISM) theory, which is a statistical-mechanical theory of molecular liquids, can yield the solvation structure at the atomic level and calculate the thermodynamic quantities from the intermolecular potentials. In the last few years, the author and coworkers have succeeded in applying the 3D-RISM theory to protein aqueous solution systems and demonstrated that the theory is useful for investigating the roles of water. This article reviews some of the recent applications and findings, which are concerned with molecular recognition by protein, protein folding, and the partial molar volume of protein which is related to the pressure effect on protein.

  12. Structural analysis of a set of proteins resulting from a bacterial genomics project.

    Science.gov (United States)

    Badger, J; Sauder, J M; Adams, J M; Antonysamy, S; Bain, K; Bergseid, M G; Buchanan, S G; Buchanan, M D; Batiyenko, Y; Christopher, J A; Emtage, S; Eroshkina, A; Feil, I; Furlong, E B; Gajiwala, K S; Gao, X; He, D; Hendle, J; Huber, A; Hoda, K; Kearins, P; Kissinger, C; Laubert, B; Lewis, H A; Lin, J; Loomis, K; Lorimer, D; Louie, G; Maletic, M; Marsh, C D; Miller, I; Molinari, J; Muller-Dieckmann, H J; Newman, J M; Noland, B W; Pagarigan, B; Park, F; Peat, T S; Post, K W; Radojicic, S; Ramos, A; Romero, R; Rutter, M E; Sanderson, W E; Schwinn, K D; Tresser, J; Winhoven, J; Wright, T A; Wu, L; Xu, J; Harris, T J R

    2005-09-01

    The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB. Copyright 2005 Wiley-Liss, Inc.

  13. Protein 8-class secondary structure prediction using conditional neural fields.

    Science.gov (United States)

    Wang, Zhiyong; Zhao, Feng; Peng, Jian; Xu, Jinbo

    2011-10-01

    Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Structure-Energy Relationships of Halogen Bonds in Proteins.

    Science.gov (United States)

    Scholfield, Matthew R; Ford, Melissa Coates; Carlsson, Anna-Carin C; Butta, Hawera; Mehl, Ryan A; Ho, P Shing

    2017-06-06

    The structures and stabilities of proteins are defined by a series of weak noncovalent electrostatic, van der Waals, and hydrogen bond (HB) interactions. In this study, we have designed and engineered halogen bonds (XBs) site-specifically to study their structure-energy relationship in a model protein, T4 lysozyme. The evidence for XBs is the displacement of the aromatic side chain toward an oxygen acceptor, at distances that are equal to or less than the sums of their respective van der Waals radii, when the hydroxyl substituent of the wild-type tyrosine is replaced by a halogen. In addition, thermal melting studies show that the iodine XB rescues the stabilization energy from an otherwise destabilizing substitution (at an equivalent noninteracting site), indicating that the interaction is also present in solution. Quantum chemical calculations show that the XB complements an HB at this site and that solvent structure must also be considered in trying to design molecular interactions such as XBs into biological systems. A bromine substitution also shows displacement of the side chain, but the distances and geometries do not indicate formation of an XB. Thus, we have dissected the contributions from various noncovalent interactions of halogens introduced into proteins, to drive the application of XBs, particularly in biomolecular design.

  15. Integral membrane protein structure determination using pseudocontact shifts

    Energy Technology Data Exchange (ETDEWEB)

    Crick, Duncan J.; Wang, Jue X. [University of Cambridge, Department of Biochemistry (United Kingdom); Graham, Bim; Swarbrick, James D. [Monash University, Monash Institute of Pharmaceutical Sciences (Australia); Mott, Helen R.; Nietlispach, Daniel, E-mail: dn206@cam.ac.uk [University of Cambridge, Department of Biochemistry (United Kingdom)

    2015-04-15

    Obtaining enough experimental restraints can be a limiting factor in the NMR structure determination of larger proteins. This is particularly the case for large assemblies such as membrane proteins that have been solubilized in a membrane-mimicking environment. Whilst in such cases extensive deuteration strategies are regularly utilised with the aim to improve the spectral quality, these schemes often limit the number of NOEs obtainable, making complementary strategies highly beneficial for successful structure elucidation. Recently, lanthanide-induced pseudocontact shifts (PCSs) have been established as a structural tool for globular proteins. Here, we demonstrate that a PCS-based approach can be successfully applied for the structure determination of integral membrane proteins. Using the 7TM α-helical microbial receptor pSRII, we show that PCS-derived restraints from lanthanide binding tags attached to four different positions of the protein facilitate the backbone structure determination when combined with a limited set of NOEs. In contrast, the same set of NOEs fails to determine the correct 3D fold. The latter situation is frequently encountered in polytopical α-helical membrane proteins and a PCS approach is thus suitable even for this particularly challenging class of membrane proteins. The ease of measuring PCSs makes this an attractive route for structure determination of large membrane proteins in general.

  16. Random amino acid mutations and protein misfolding lead to Shannon limit in sequence-structure communication.

    Directory of Open Access Journals (Sweden)

    Andreas Martin Lisewski

    2008-09-01

    Full Text Available The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions and in structure (structural defects trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a sensitive to random errors and (b restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.

  17. Fragger: a protein fragment picker for structural queries [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Francois Berenger

    2018-04-01

    Full Text Available Protein modeling and design activities often require querying the Protein Data Bank (PDB with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  18. Sampling Realistic Protein Conformations Using Local Structural Bias

    DEFF Research Database (Denmark)

    Hamelryck, Thomas Wim; Kent, John T.; Krogh, A.

    2006-01-01

    The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which...... are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded...... in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications...

  19. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

    Science.gov (United States)

    Fan, Ming; Zheng, Bin; Li, Lihua

    2015-10-01

    Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.

  20. Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive.

    Science.gov (United States)

    Burley, Stephen K; Berman, Helen M; Kleywegt, Gerard J; Markley, John L; Nakamura, Haruki; Velankar, Sameer

    2017-01-01

    The Protein Data Bank (PDB)--the single global repository of experimentally determined 3D structures of biological macromolecules and their complexes--was established in 1971, becoming the first open-access digital resource in the biological sciences. The PDB archive currently houses ~130,000 entries (May 2017). It is managed by the Worldwide Protein Data Bank organization (wwPDB; wwpdb.org), which includes the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu). The four wwPDB partners operate a unified global software system that enforces community-agreed data standards and supports data Deposition, Biocuration, and Validation of ~11,000 new PDB entries annually (deposit.wwpdb.org). The RCSB PDB currently acts as the archive keeper, ensuring disaster recovery of PDB data and coordinating weekly updates. wwPDB partners disseminate the same archival data from multiple FTP sites, while operating complementary websites that provide their own views of PDB data with selected value-added information and links to related data resources. At present, the PDB archives experimental data, associated metadata, and 3D-atomic level structural models derived from three well-established methods: crystallography, nuclear magnetic resonance spectroscopy (NMR), and electron microscopy (3DEM). wwPDB partners are working closely with experts in related experimental areas (small-angle scattering, chemical cross-linking/mass spectrometry, Forster energy resonance transfer or FRET, etc.) to establish a federation of data resources that will support sustainable archiving and validation of 3D structural models and experimental data derived from integrative or hybrid methods.

  1. Exploring the universe of protein structures beyond the Protein Data Bank.

    Directory of Open Access Journals (Sweden)

    Pilar Cossio

    Full Text Available It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds.

  2. Simultaneous determination of protein structure and dynamics

    DEFF Research Database (Denmark)

    Lindorff-Larsen, Kresten; Best, Robert B.; DePristo, M. A.

    2005-01-01

    at the atomic level about the structural and dynamical features of proteins-with the ability of molecular dynamics simulations to explore a wide range of protein conformations. We illustrate the method for human ubiquitin in solution and find that there is considerable conformational heterogeneity throughout......We present a protocol for the experimental determination of ensembles of protein conformations that represent simultaneously the native structure and its associated dynamics. The procedure combines the strengths of nuclear magnetic resonance spectroscopy-for obtaining experimental information...... the protein structure. The interior atoms of the protein are tightly packed in each individual conformation that contributes to the ensemble but their overall behaviour can be described as having a significant degree of liquid-like character. The protocol is completely general and should lead to significant...

  3. Predicting protein folding pathways at the mesoscopic level based on native interactions between secondary structure elements

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2008-07-01

    Full Text Available Abstract Background Since experimental determination of protein folding pathways remains difficult, computational techniques are often used to simulate protein folding. Most current techniques to predict protein folding pathways are computationally intensive and are suitable only for small proteins. Results By assuming that the native structure of a protein is known and representing each intermediate conformation as a collection of fully folded structures in which each of them contains a set of interacting secondary structure elements, we show that it is possible to significantly reduce the conformation space while still being able to predict the most energetically favorable folding pathway of large proteins with hundreds of residues at the mesoscopic level, including the pig muscle phosphoglycerate kinase with 416 residues. The model is detailed enough to distinguish between different folding pathways of structurally very similar proteins, including the streptococcal protein G and the peptostreptococcal protein L. The model is also able to recognize the differences between the folding pathways of protein G and its two structurally similar variants NuG1 and NuG2, which are even harder to distinguish. We show that this strategy can produce accurate predictions on many other proteins with experimentally determined intermediate folding states. Conclusion Our technique is efficient enough to predict folding pathways for both large and small proteins at the mesoscopic level. Such a strategy is often the only feasible choice for large proteins. A software program implementing this strategy (SSFold is available at http://faculty.cs.tamu.edu/shsze/ssfold.

  4. DeepQA: Improving the estimation of single protein model quality with deep belief networks

    OpenAIRE

    Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin

    2016-01-01

    Background Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. Results We introduce a novel single-model quality assessment method DeepQA based on deep belie...

  5. K-nearest uphill clustering in the protein structure space

    KAUST Repository

    Cui, Xuefeng

    2016-08-26

    The protein structure classification problem, which is to assign a protein structure to a cluster of similar proteins, is one of the most fundamental problems in the construction and application of the protein structure space. Early manually curated protein structure classifications (e.g., SCOP and CATH) are very successful, but recently suffer the slow updating problem because of the increased throughput of newly solved protein structures. Thus, fully automatic methods to cluster proteins in the protein structure space have been designed and developed. In this study, we observed that the SCOP superfamilies are highly consistent with clustering trees representing hierarchical clustering procedures, but the tree cutting is very challenging and becomes the bottleneck of clustering accuracy. To overcome this challenge, we proposed a novel density-based K-nearest uphill clustering method that effectively eliminates noisy pairwise protein structure similarities and identifies density peaks as cluster centers. Specifically, the density peaks are identified based on K-nearest uphills (i.e., proteins with higher densities) and K-nearest neighbors. To our knowledge, this is the first attempt to apply and develop density-based clustering methods in the protein structure space. Our results show that our density-based clustering method outperforms the state-of-the-art clustering methods previously applied to the problem. Moreover, we observed that computational methods and human experts could produce highly similar clusters at high precision values, while computational methods also suggest to split some large superfamilies into smaller clusters. © 2016 Elsevier B.V.

  6. Low Resolution Structure of RAR1-GST-Tag Fusion Protein in Solution

    International Nuclear Information System (INIS)

    Taube, M.; Kozak, M.; Jarmolowski, A.

    2010-01-01

    RAR1 is a protein required for resistance mediated by many R genes and function upstream of signaling pathways leading to H 2 O 2 accumulation. The structure and conformation of RAR1-GST-Tag fusion protein from barley (Hordeum vulgare) in solution was studied by the small angle scattering of synchrotron radiation. It was found that the dimer of RAR1-GST-Tag protein is characterized in solution by radius of gyration R G = 6.19 nm and maximal intramolecular vector D max = 23 nm. On the basis of the small angle scattering of synchrotron radiation SAXS data two bead models obtained by ab initio modeling are proposed. Both models show elongated conformations. We also concluded that molecules of fusion protein form: dimers in solution via interaction of GST domains. (authors)

  7. Structural Mass Spectrometry of Proteins Using Hydroxyl Radical Based Protein Footprinting

    OpenAIRE

    Wang, Liwen; Chance, Mark R.

    2011-01-01

    Structural MS is a rapidly growing field with many applications in basic research and pharmaceutical drug development. In this feature article the overall technology is described and several examples of how hydroxyl radical based footprinting MS can be used to map interfaces, evaluate protein structure, and identify ligand dependent conformational changes in proteins are described.

  8. Extracting knowledge from protein structure geometry

    DEFF Research Database (Denmark)

    Røgen, Peter; Koehl, Patrice

    2013-01-01

    potential from geometric knowledge extracted from native and misfolded conformers of protein structures. This new potential, Metric Protein Potential (MPP), has two main features that are key to its success. Firstly, it is composite in that it includes local and nonlocal geometric information on proteins...

  9. Structural model of the hUbA1-UbcH10 quaternary complex: in silico and experimental analysis of the protein-protein interactions between E1, E2 and ubiquitin.

    Directory of Open Access Journals (Sweden)

    Stefania Correale

    Full Text Available UbcH10 is a component of the Ubiquitin Conjugation Enzymes (Ubc; E2 involved in the ubiquitination cascade controlling the cell cycle progression, whereby ubiquitin, activated by E1, is transferred through E2 to the target protein with the involvement of E3 enzymes. In this work we propose the first three dimensional model of the tetrameric complex formed by the human UbA1 (E1, two ubiquitin molecules and UbcH10 (E2, leading to the transthiolation reaction. The 3D model was built up by using an experimentally guided incremental docking strategy that combined homology modeling, protein-protein docking and refinement by means of molecular dynamics simulations. The structural features of the in silico model allowed us to identify the regions that mediate the recognition between the interacting proteins, revealing the active role of the ubiquitin crosslinked to E1 in the complex formation. Finally, the role of these regions involved in the E1-E2 binding was validated by designing short peptides that specifically interfere with the binding of UbcH10, thus supporting the reliability of the proposed model and representing valuable scaffolds for the design of peptidomimetic compounds that can bind selectively to Ubcs and inhibit the ubiquitylation process in pathological disorders.

  10. Multiple functional roles of the accessory I-domain of bacteriophage P22 coat protein revealed by NMR structure and CryoEM modeling.

    Science.gov (United States)

    Rizzo, Alessandro A; Suhanovsky, Margaret M; Baker, Matthew L; Fraser, LaTasha C R; Jones, Lisa M; Rempel, Don L; Gross, Michael L; Chiu, Wah; Alexandrescu, Andrei T; Teschke, Carolyn M

    2014-06-10

    Some capsid proteins built on the ubiquitous HK97-fold have accessory domains imparting specific functions. Bacteriophage P22 coat protein has a unique insertion domain (I-domain). Two prior I-domain models from subnanometer cryoelectron microscopy (cryoEM) reconstructions differed substantially. Therefore, the I-domain's nuclear magnetic resonance structure was determined and also used to improve cryoEM models of coat protein. The I-domain has an antiparallel six-stranded β-barrel fold, not previously observed in HK97-fold accessory domains. The D-loop, which is dynamic in the isolated I-domain and intact monomeric coat protein, forms stabilizing salt bridges between adjacent capsomers in procapsids. The S-loop is important for capsid size determination, likely through intrasubunit interactions. Ten of 18 coat protein temperature-sensitive-folding substitutions are in the I-domain, indicating its importance in folding and stability. Several are found on a positively charged face of the β-barrel that anchors the I-domain to a negatively charged surface of the coat protein HK97-core. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Models of crk adaptor proteins in cancer.

    Science.gov (United States)

    Bell, Emily S; Park, Morag

    2012-05-01

    The Crk family of adaptor proteins (CrkI, CrkII, and CrkL), originally discovered as the oncogene fusion product, v-Crk, of the CT10 chicken retrovirus, lacks catalytic activity but engages with multiple signaling pathways through their SH2 and SH3 domains. Crk proteins link upstream tyrosine kinase and integrin-dependent signals to downstream effectors, acting as adaptors in diverse signaling pathways and cellular processes. Crk proteins are now recognized to play a role in the malignancy of many human cancers, stimulating renewed interest in their mechanism of action in cancer progression. The contribution of Crk signaling to malignancy has been predominantly studied in fibroblasts and in hematopoietic models and more recently in epithelial models. A mechanistic understanding of Crk proteins in cancer progression in vivo is still poorly understood in part due to the highly pleiotropic nature of Crk signaling. Recent advances in the structural organization of Crk domains, new roles in kinase regulation, and increased knowledge of the mechanisms and frequency of Crk overexpression in human cancers have provided an incentive for further study in in vivo models. An understanding of the mechanisms through which Crk proteins act as oncogenic drivers could have important implications in therapeutic targeting.

  12. Relationship between Molecular Structure Characteristics of Feed Proteins and Protein Digestibility and Solubility

    Directory of Open Access Journals (Sweden)

    Mingmei Bai

    2016-08-01

    Full Text Available The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller’s dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003; moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004. On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (p<0.001 and solubility (p = 0.002. These results demonstrate that the molecular structure characteristics of feed proteins are closely related to their in vitro digestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins.

  13. Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models

    Science.gov (United States)

    Ekeberg, Magnus; Lövkvist, Cecilia; Lan, Yueheng; Weigt, Martin; Aurell, Erik

    2013-01-01

    Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at http://plmdca.csc.kth.se/.

  14. The AUDANA algorithm for automated protein 3D structure determination from NMR NOE data

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Woonghee, E-mail: whlee@nmrfam.wisc.edu [University of Wisconsin-Madison, National Magnetic Resonance Facility at Madison and Biochemistry Department (United States); Petit, Chad M. [University of Alabama at Birmingham, Department of Biochemistry and Molecular Genetics (United States); Cornilescu, Gabriel; Stark, Jaime L.; Markley, John L., E-mail: markley@nmrfam.wisc.edu [University of Wisconsin-Madison, National Magnetic Resonance Facility at Madison and Biochemistry Department (United States)

    2016-06-15

    We introduce AUDANA (Automated Database-Assisted NOE Assignment), an algorithm for determining three-dimensional structures of proteins from NMR data that automates the assignment of 3D-NOE spectra, generates distance constraints, and conducts iterative high temperature molecular dynamics and simulated annealing. The protein sequence, chemical shift assignments, and NOE spectra are the only required inputs. Distance constraints generated automatically from ambiguously assigned NOE peaks are validated during the structure calculation against information from an enlarged version of the freely available PACSY database that incorporates information on protein structures deposited in the Protein Data Bank (PDB). This approach yields robust sets of distance constraints and 3D structures. We evaluated the performance of AUDANA with input data for 14 proteins ranging in size from 6 to 25 kDa that had 27–98 % sequence identity to proteins in the database. In all cases, the automatically calculated 3D structures passed stringent validation tests. Structures were determined with and without database support. In 9/14 cases, database support improved the agreement with manually determined structures in the PDB and in 11/14 cases, database support lowered the r.m.s.d. of the family of 20 structural models.

  15. The AUDANA algorithm for automated protein 3D structure determination from NMR NOE data

    International Nuclear Information System (INIS)

    Lee, Woonghee; Petit, Chad M.; Cornilescu, Gabriel; Stark, Jaime L.; Markley, John L.

    2016-01-01

    We introduce AUDANA (Automated Database-Assisted NOE Assignment), an algorithm for determining three-dimensional structures of proteins from NMR data that automates the assignment of 3D-NOE spectra, generates distance constraints, and conducts iterative high temperature molecular dynamics and simulated annealing. The protein sequence, chemical shift assignments, and NOE spectra are the only required inputs. Distance constraints generated automatically from ambiguously assigned NOE peaks are validated during the structure calculation against information from an enlarged version of the freely available PACSY database that incorporates information on protein structures deposited in the Protein Data Bank (PDB). This approach yields robust sets of distance constraints and 3D structures. We evaluated the performance of AUDANA with input data for 14 proteins ranging in size from 6 to 25 kDa that had 27–98 % sequence identity to proteins in the database. In all cases, the automatically calculated 3D structures passed stringent validation tests. Structures were determined with and without database support. In 9/14 cases, database support improved the agreement with manually determined structures in the PDB and in 11/14 cases, database support lowered the r.m.s.d. of the family of 20 structural models.

  16. The ModFOLD4 server for the quality assessment of 3D protein models

    OpenAIRE

    McGuffin, Liam J.; Buenavista, Maria T.; Roche, Daniel B.

    2013-01-01

    Once you have generated a 3D model of a protein,\\ud how do you know whether it bears any resemblance\\ud to the actual structure? To determine the usefulness\\ud of 3D models of proteins, they must be assessed in\\ud terms of their quality by methods that predict their\\ud similarity to the native structure. The ModFOLD4\\ud server is the latest version of our leading independent\\ud server for the estimation of both the global and\\ud local (per-residue) quality of 3D protein models. The\\ud server ...

  17. Effect of thermal processing on estimated metabolizable protein supply to dairy cattle from camelina seeds: relationship with protein molecular structural changes.

    Science.gov (United States)

    Peng, Quanhui; Khan, Nazir A; Wang, Zhisheng; Zhang, Xuewei; Yu, Peiqiang

    2014-08-20

    This study evaluated the effect of thermal processing on the estimated metabolizable protein (MP) supply to dairy cattle from camelina seeds (Camelina sativa L. Crantz) and determined the relationship between heat-induced changes in protein molecular structural characteristics and the MP supply. Seeds from two camelina varieties were sampled in two consecutive years and were either kept raw or were heated in an autoclave (moist heating) or in an air-draft oven (dry heating) at 120 °C for 1 h. The MP supply to dairy cattle was modeled by three commonly used protein evaluation systems. The protein molecular structures were analyzed by Fourier transform/infrared-attenuated total reflectance molecular spectroscopy. The results showed that both the dry and moist heating increased the contents of truly absorbable rumen-undegraded protein (ARUP) and total MP and decreased the degraded protein balance (DPB). However, the moist-heated camelina seeds had a significantly higher (P seeds. The regression equations showed that intensities of the protein molecular structural bands can be used to estimate the contents of ARUP, MP, and DPB with high accuracy (R(2) > 0.70). These results show that protein molecular structural characteristics can be used to rapidly assess the MP supply to dairy cattle from raw and heat-treated camelina seeds.

  18. A lock-and-key model for protein–protein interactions

    OpenAIRE

    Morrison, Julie L.; Breitling, Rainer; Higham, Desmond J.; Gilbert, David R.

    2006-01-01

    Motivation: Protein–protein interaction networks are one of the major post-genomic data sources available to molecular biologists. They provide a comprehensive view of the global interaction structure of an organism’s proteome, as well as detailed information on specific interactions. Here we suggest a physical model of protein interactions that can be used to extract additional information at an intermediate level: It enables us to identify proteins which share biological interaction motifs,...

  19. Mechanical strength of 17,134 model proteins and cysteine slipknots.

    Directory of Open Access Journals (Sweden)

    Mateusz Sikora

    2009-10-01

    Full Text Available A new theoretical survey of proteins' resistance to constant speed stretching is performed for a set of 17,134 proteins as described by a structure-based model. The proteins selected have no gaps in their structure determination and consist of no more than 250 amino acids. Our previous studies have dealt with 7510 proteins of no more than 150 amino acids. The proteins are ranked according to the strength of the resistance. Most of the predicted top-strength proteins have not yet been studied experimentally. Architectures and folds which are likely to yield large forces are identified. New types of potent force clamps are discovered. They involve disulphide bridges and, in particular, cysteine slipknots. An effective energy parameter of the model is estimated by comparing the theoretical data on characteristic forces to the corresponding experimental values combined with an extrapolation of the theoretical data to the experimental pulling speeds. These studies provide guidance for future experiments on single molecule manipulation and should lead to selection of proteins for applications. A new class of proteins, involving cysteine slipknots, is identified as one that is expected to lead to the strongest force clamps known. This class is characterized through molecular dynamics simulations.

  20. Folding 19 proteins to their native state and stability of large proteins from a coarse-grained model.

    Science.gov (United States)

    Kapoor, Abhijeet; Travesset, Alex

    2014-03-01

    We develop an intermediate resolution model, where the backbone is modeled with atomic resolution but the side chain with a single bead, by extending our previous model (Proteins (2013) DOI: 10.1002/prot.24269) to properly include proline, preproline residues and backbone rigidity. Starting from random configurations, the model properly folds 19 proteins (including a mutant 2A3D sequence) into native states containing β sheet, α helix, and mixed α/β. As a further test, the stability of H-RAS (a 169 residue protein, critical in many signaling pathways) is investigated: The protein is stable, with excellent agreement with experimental B-factors. Despite that proteins containing only α helices fold to their native state at lower backbone rigidity, and other limitations, which we discuss thoroughly, the model provides a reliable description of the dynamics as compared with all atom simulations, but does not constrain secondary structures as it is typically the case in more coarse-grained models. Further implications are described. Copyright © 2013 Wiley Periodicals, Inc.

  1. Protein-protein interaction networks identify targets which rescue the MPP+ cellular model of Parkinson’s disease

    Science.gov (United States)

    Keane, Harriet; Ryan, Brent J.; Jackson, Brendan; Whitmore, Alan; Wade-Martins, Richard

    2015-11-01

    Neurodegenerative diseases are complex multifactorial disorders characterised by the interplay of many dysregulated physiological processes. As an exemplar, Parkinson’s disease (PD) involves multiple perturbed cellular functions, including mitochondrial dysfunction and autophagic dysregulation in preferentially-sensitive dopamine neurons, a selective pathophysiology recapitulated in vitro using the neurotoxin MPP+. Here we explore a network science approach for the selection of therapeutic protein targets in the cellular MPP+ model. We hypothesised that analysis of protein-protein interaction networks modelling MPP+ toxicity could identify proteins critical for mediating MPP+ toxicity. Analysis of protein-protein interaction networks constructed to model the interplay of mitochondrial dysfunction and autophagic dysregulation (key aspects of MPP+ toxicity) enabled us to identify four proteins predicted to be key for MPP+ toxicity (P62, GABARAP, GBRL1 and GBRL2). Combined, but not individual, knockdown of these proteins increased cellular susceptibility to MPP+ toxicity. Conversely, combined, but not individual, over-expression of the network targets provided rescue of MPP+ toxicity associated with the formation of autophagosome-like structures. We also found that modulation of two distinct proteins in the protein-protein interaction network was necessary and sufficient to mitigate neurotoxicity. Together, these findings validate our network science approach to multi-target identification in complex neurological diseases.

  2. Automated protein structure calculation from NMR data

    International Nuclear Information System (INIS)

    Williamson, Mike P.; Craven, C. Jeremy

    2009-01-01

    Current software is almost at the stage to permit completely automatic structure determination of small proteins of <15 kDa, from NMR spectra to structure validation with minimal user interaction. This goal is welcome, as it makes structure calculation more objective and therefore more easily validated, without any loss in the quality of the structures generated. Moreover, it releases expert spectroscopists to carry out research that cannot be automated. It should not take much further effort to extend automation to ca 20 kDa. However, there are technological barriers to further automation, of which the biggest are identified as: routines for peak picking; adoption and sharing of a common framework for structure calculation, including the assembly of an automated and trusted package for structure validation; and sample preparation, particularly for larger proteins. These barriers should be the main target for development of methodology for protein structure determination, particularly by structural genomics consortia

  3. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11.

    Science.gov (United States)

    Zhang, Wenxuan; Yang, Jianyi; He, Baoji; Walker, Sara Elizabeth; Zhang, Hongjiu; Govindarajoo, Brandon; Virtanen, Jouko; Xue, Zhidong; Shen, Hong-Bin; Zhang, Yang

    2016-09-01

    We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  4. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction.

    Science.gov (United States)

    de Oliveira, Saulo H P; Law, Eleanor C; Shi, Jiye; Deane, Charlotte M

    2018-04-01

    Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. saulo.deoliveira@dtc.ox.ac.uk. Supplementary data are available at Bioinformatics online.

  5. Effect of tissue scaffold topography on protein structure monitored by fluorescence spectroscopy.

    Science.gov (United States)

    Portugal, Carla A M; Truckenmüller, Roman; Stamatialis, Dimitrios; Crespo, João G

    2014-11-10

    The impact of surface topography on the structure of proteins upon adhesion was assessed through non-invasive fluorescence monitoring. This study aimed at obtaining a better understanding about the role of protein structural status on cell-scaffold interactions. The changes induced upon adsorption of two model proteins with different geometries, trypsin (globular conformation) and fibrinogen (rod-shaped conformation) on poly-l-lactic acid (PLLA) scaffolds with different surface topographies, flat, fibrous and surfaces with aligned nanogrooves, were assessed by fluorescence spectroscopy monitoring, using tryptophan as structural probe. Hence, the maximum emission blue shift and the increase of fluorescence anisotropy observed after adsorption of globular and rod-like shaped proteins on surfaces with parallel nanogrooves were ascribed to more intense protein-surface interactions. Furthermore, the decrease of fluorescence anisotropy observed upon adsorption of proteins to scaffolds with fibrous morphology was more significant for rod-shaped proteins. This effect was associated to the ability of these proteins to adjust to curved surfaces. The additional unfolding of proteins induced upon adsorption on scaffolds with a fibrous morphology may be the reason for better cell attachment there, promoting an easier access of cell receptors to initially hidden protein regions (e.g. RGDS sequence), which are known to have a determinant role in cell attaching processes. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. Relation between native ensembles and experimental structures of proteins

    DEFF Research Database (Denmark)

    Best, R. B.; Lindorff-Larsen, Kresten; DePristo, M. A.

    2006-01-01

    Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of "high-sequence similarity Protein Data Bank" (HSP) structures and consider the extent to which such ensembles represent the structural...... Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest...... heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein...

  7. RaptorX-Property: a web server for protein structure property prediction.

    Science.gov (United States)

    Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

    2016-07-08

    RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. p15PAF is an intrinsically disordered protein with nonrandom structural preferences at sites of interaction with other proteins.

    Science.gov (United States)

    De Biasio, Alfredo; Ibáñez de Opakua, Alain; Cordeiro, Tiago N; Villate, Maider; Merino, Nekane; Sibille, Nathalie; Lelli, Moreno; Diercks, Tammo; Bernadó, Pau; Blanco, Francisco J

    2014-02-18

    We present to our knowledge the first structural characterization of the proliferating-cell-nuclear-antigen-associated factor p15(PAF), showing that it is monomeric and intrinsically disordered in solution but has nonrandom conformational preferences at sites of protein-protein interactions. p15(PAF) is a 12 kDa nuclear protein that acts as a regulator of DNA repair during DNA replication. The p15(PAF) gene is overexpressed in several types of human cancer. The nearly complete NMR backbone assignment of p15(PAF) allowed us to measure 86 N-H(N) residual dipolar couplings. Our residual dipolar coupling analysis reveals nonrandom conformational preferences in distinct regions, including the proliferating-cell-nuclear-antigen-interacting protein motif (PIP-box) and the KEN-box (recognized by the ubiquitin ligase that targets p15(PAF) for degradation). In accordance with these findings, analysis of the (15)N R2 relaxation rates shows a relatively reduced mobility for the residues in these regions. The agreement between the experimental small angle x-ray scattering curve of p15(PAF) and that computed from a statistical coil ensemble corrected for the presence of local secondary structural elements further validates our structural model for p15(PAF). The coincidence of these transiently structured regions with protein-protein interaction and posttranslational modification sites suggests a possible role for these structures as molecular recognition elements for p15(PAF). Copyright © 2014 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  9. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    Science.gov (United States)

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  10. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    Science.gov (United States)

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.

  11. Solution structure of the human signaling protein RACK1

    Directory of Open Access Journals (Sweden)

    Papa Priscila F

    2010-06-01

    Full Text Available Abstract Background The adaptor protein RACK1 (receptor of activated kinase 1 was originally identified as an anchoring protein for protein kinase C. RACK1 is a 36 kDa protein, and is composed of seven WD repeats which mediate its protein-protein interactions. RACK1 is ubiquitously expressed and has been implicated in diverse cellular processes involving: protein translation regulation, neuropathological processes, cellular stress, and tissue development. Results In this study we performed a biophysical analysis of human RACK1 with the aim of obtaining low resolution structural information. Small angle X-ray scattering (SAXS experiments demonstrated that human RACK1 is globular and monomeric in solution and its low resolution structure is strikingly similar to that of an homology model previously calculated by us and to the crystallographic structure of RACK1 isoform A from Arabidopsis thaliana. Both sedimentation velocity and sedimentation equilibrium analytical ultracentrifugation techniques showed that RACK1 is predominantly a monomer of around 37 kDa in solution, but also presents small amounts of oligomeric species. Moreover, hydrodynamic data suggested that RACK1 has a slightly asymmetric shape. The interaction of RACK1 and Ki-1/57 was tested by sedimentation equilibrium. The results suggested that the association between RACK1 and Ki-1/57(122-413 follows a stoichiometry of 1:1. The binding constant (KB observed for RACK1-Ki-1/57(122-413 interaction was of around (1.5 ± 0.2 × 106 M-1 and resulted in a dissociation constant (KD of (0.7 ± 0.1 × 10-6 M. Moreover, the fluorescence data also suggests that the interaction may occur in a cooperative fashion. Conclusion Our SAXS and analytical ultracentrifugation experiments indicated that RACK1 is predominantly a monomer in solution. RACK1 and Ki-1/57(122-413 interact strongly under the tested conditions.

  12. An approach to creating a more realistic working model from a protein data bank entry.

    Science.gov (United States)

    Brandon, Christopher J; Martin, Benjamin P; McGee, Kelly J; Stewart, James J P; Braun-Sand, Sonja B

    2015-01-01

    An accurate model of three-dimensional protein structure is important in a variety of fields such as structure-based drug design and mechanistic studies of enzymatic reactions. While the entries in the Protein Data Bank ( http://www.pdb.org ) provide valuable information about protein structures, a small fraction of the PDB structures were found to contain anomalies not reported in the PDB file. The semiempirical PM7 method in MOPAC2012 was used for identifying anomalously short hydrogen bonds, C-H⋯O/C-H⋯N interactions, non-bonding close contacts, and unrealistic covalent bond lengths in recently published Protein Data Bank files. It was also used to generate new structures with these faults removed. When the semiempirical models were compared to those of PDB_REDO (http://www.cmbi.ru.nl/pdb_redo/), the clashscores, as defined by MolProbity ( http://molprobity.biochem.duke.edu/), were better in about 50% of the structures. The semiempirical models also had a lower root-mean-square-deviation value in nearly all cases than those from PDB_REDO, indicative of a better conservation of the tertiary structure. Finally, the semiempirical models were found to have lower clashscores than the initial PDB file in all but one case. Because this approach maintains as much of the original tertiary structure as possible while improving anomalous interactions, it should be useful to theoreticians, experimentalists, and crystallographers investigating the structure and function of proteins.

  13. Structural Transition and Antibody Binding of EBOV GP and ZIKV E Proteins from Pre-Fusion to Fusion-Initiation State

    Directory of Open Access Journals (Sweden)

    Anna Lappala

    2018-05-01

    Full Text Available Membrane fusion proteins are responsible for viral entry into host cells—a crucial first step in viral infection. These proteins undergo large conformational changes from pre-fusion to fusion-initiation structures, and, despite differences in viral genomes and disease etiology, many fusion proteins are arranged as trimers. Structural information for both pre-fusion and fusion-initiation states is critical for understanding virus neutralization by the host immune system. In the case of Ebola virus glycoprotein (EBOV GP and Zika virus envelope protein (ZIKV E, pre-fusion state structures have been identified experimentally, but only partial structures of fusion-initiation states have been described. While the fusion-initiation structure is in an energetically unfavorable state that is difficult to solve experimentally, the existing structural information combined with computational approaches enabled the modeling of fusion-initiation state structures of both proteins. These structural models provide an improved understanding of four different neutralizing antibodies in the prevention of viral host entry.

  14. Magic Angle Spinning NMR Structure Determination of Proteins from Pseudocontact Shifts

    KAUST Repository

    Li, Jianping

    2013-06-05

    Magic angle spinning solid-state NMR is a unique technique to study atomic-resolution structure of biomacromolecules which resist crystallization or are too large to study by solution NMR techniques. However, difficulties in obtaining sufficient number of long-range distance restraints using dipolar coupling based spectra hamper the process of structure determination of proteins in solid-state NMR. In this study it is shown that high-resolution structure of proteins in solid phase can be determined without the use of traditional dipolar-dipolar coupling based distance restraints by combining the measurements of pseudocontact shifts (PCSs) with Rosetta calculations. The PCSs were generated by chelating exogenous paramagnetic metal ions to a tag 4-mercaptomethyl-dipicolinic acid, which is covalently attached to different residue sites in a 56-residue immunoglobulin-binding domain of protein G (GB1). The long-range structural restraints with metal-nucleus distance of up to ∼20 Å are quantitatively extracted from experimentally observed PCSs, and these are in good agreement with the distances back-calculated using an X-ray structure model. Moreover, we demonstrate that using several paramagnetic ions with varied paramagnetic susceptibilities as well as the introduction of paramagnetic labels at different sites can dramatically increase the number of long-range restraints and cover different regions of the protein. The structure generated from solid-state NMR PCSs restraints combined with Rosetta calculations has 0.7 Å root-mean-square deviation relative to X-ray structure. © 2013 American Chemical Society.

  15. Magic Angle Spinning NMR Structure Determination of Proteins from Pseudocontact Shifts

    KAUST Repository

    Li, Jianping; Pilla, Kala Bharath; Li, Qingfeng; Zhang, Zhengfeng; Su, Xuncheng; Huber, Thomas; Yang, Jun

    2013-01-01

    Magic angle spinning solid-state NMR is a unique technique to study atomic-resolution structure of biomacromolecules which resist crystallization or are too large to study by solution NMR techniques. However, difficulties in obtaining sufficient number of long-range distance restraints using dipolar coupling based spectra hamper the process of structure determination of proteins in solid-state NMR. In this study it is shown that high-resolution structure of proteins in solid phase can be determined without the use of traditional dipolar-dipolar coupling based distance restraints by combining the measurements of pseudocontact shifts (PCSs) with Rosetta calculations. The PCSs were generated by chelating exogenous paramagnetic metal ions to a tag 4-mercaptomethyl-dipicolinic acid, which is covalently attached to different residue sites in a 56-residue immunoglobulin-binding domain of protein G (GB1). The long-range structural restraints with metal-nucleus distance of up to ∼20 Å are quantitatively extracted from experimentally observed PCSs, and these are in good agreement with the distances back-calculated using an X-ray structure model. Moreover, we demonstrate that using several paramagnetic ions with varied paramagnetic susceptibilities as well as the introduction of paramagnetic labels at different sites can dramatically increase the number of long-range restraints and cover different regions of the protein. The structure generated from solid-state NMR PCSs restraints combined with Rosetta calculations has 0.7 Å root-mean-square deviation relative to X-ray structure. © 2013 American Chemical Society.

  16. Alpha complexes in protein structure prediction

    DEFF Research Database (Denmark)

    Winter, Pawel; Fonseca, Rasmus

    2015-01-01

    Reducing the computational effort and increasing the accuracy of potential energy functions is of utmost importance in modeling biological systems, for instance in protein structure prediction, docking or design. Evaluating interactions between nonbonded atoms is the bottleneck of such computations......-complexes from scratch for every configuration encountered during the search for the native structure would make this approach hopelessly slow. However, it is argued that kinetic a-complexes can be used to reduce the computational effort of determining the potential energy when "moving" from one configuration...... to a neighboring one. As a consequence, relatively expensive (initial) construction of an a-complex is expected to be compensated by subsequent fast kinetic updates during the search process. Computational results presented in this paper are limited. However, they suggest that the applicability of a...

  17. Automated determination of fibrillar structures by simultaneous model building and fiber diffraction refinement.

    Science.gov (United States)

    Potrzebowski, Wojciech; André, Ingemar

    2015-07-01

    For highly oriented fibrillar molecules, three-dimensional structures can often be determined from X-ray fiber diffraction data. However, because of limited information content, structure determination and validation can be challenging. We demonstrate that automated structure determination of protein fibers can be achieved by guiding the building of macromolecular models with fiber diffraction data. We illustrate the power of our approach by determining the structures of six bacteriophage viruses de novo using fiber diffraction data alone and together with solid-state NMR data. Furthermore, we demonstrate the feasibility of molecular replacement from monomeric and fibrillar templates by solving the structure of a plant virus using homology modeling and protein-protein docking. The generated models explain the experimental data to the same degree as deposited reference structures but with improved structural quality. We also developed a cross-validation method for model selection. The results highlight the power of fiber diffraction data as structural constraints.

  18. DNA mimic proteins: functions, structures, and bioinformatic analysis.

    Science.gov (United States)

    Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J

    2014-05-13

    DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.

  19. Molecular Chemical Structure of Barley Proteins Revealed by Ultra-Spatially Resolved Synchrotron Light Sourced FTIR Microspectroscopy: Comparison of Barley Varieties

    International Nuclear Information System (INIS)

    Yu, P.

    2007-01-01

    Barley protein structure affects the barley quality, fermentation, and degradation behavior in both humans and animals among other factors such as protein matrix. Publications show various biological differences among barley varieties such as Valier and Harrington, which have significantly different degradation behaviors. The objectives of this study were to reveal the molecular structure of barley protein, comparing various varieties (Dolly, Valier, Harrington, LP955, AC Metcalfe, and Sisler), and quantify protein structure profiles using Gaussian and Lorentzian methods of multi-component peak modeling by using the ultra-spatially resolved synchrotron light sourced Fourier transform infrared microspectroscopy (SFTIRM). The items of the protein molecular structure revealed included protein structure α-helices, β-sheets, and others such as β-turns and random coils. The experiment was performed at the National Synchrotron Light Source in Brookhaven National Laboratory (BNL, US Department of Energy, NY). The results showed that with the SFTIRM, the molecular structure of barley protein could be revealed. Barley protein structures exhibited significant differences among the varieties in terms of proportion and ratio of model-fitted α-helices, β-sheets, and others. By using multi-component peaks modeling at protein amide I region of 1710-1576 cm -1 , the results show that barley protein consisted of approximately 18-34% of α-helices, 14-25% of β-sheets, and 44-69% others. AC Metcalfe, Sisler, and LP955 consisted of higher (P 0.05). The ratio of α-helices to others (0.3 to 1.0, P < 0.05) and that of β-sheets to others (0.2 to 0.8, P < 0.05) were different among the barley varieties. It needs to be pointed out that using a multi-peak modeling for protein structure analysis is only for making relative estimates and not exact determinations and only for the comparison purpose between varieties. The principal component analysis showed that protein amide I Fourier

  20. A structural model of the E. coli PhoB Dimer in the transcription initiation complex

    Directory of Open Access Journals (Sweden)

    Tung Chang-Shung

    2012-03-01

    Full Text Available Abstract Background There exist > 78,000 proteins and/or nucleic acids structures that were determined experimentally. Only a small portion of these structures corresponds to those of protein complexes. While homology modeling is able to exploit knowledge-based potentials of side-chain rotomers and backbone motifs to infer structures for new proteins, no such general method exists to extend our understanding of protein interaction motifs to novel protein complexes. Results We use a Motif Binding Geometries (MBG approach, to infer the structure of a protein complex from the database of complexes of homologous proteins taken from other contexts (such as the helix-turn-helix motif binding double stranded DNA, and demonstrate its utility on one of the more important regulatory complexes in biology, that of the RNA polymerase initiating transcription under conditions of phosphate starvation. The modeled PhoB/RNAP/σ-factor/DNA complex is stereo-chemically reasonable, has sufficient interfacial Solvent Excluded Surface Areas (SESAs to provide adequate binding strength, is physically meaningful for transcription regulation, and is consistent with a variety of known experimental constraints. Conclusions Based on a straightforward and easy to comprehend concept, "proteins and protein domains that fold similarly could interact similarly", a structural model of the PhoB dimer in the transcription initiation complex has been developed. This approach could be extended to enable structural modeling and prediction of other bio-molecular complexes. Just as models of individual proteins provide insight into molecular recognition, catalytic mechanism, and substrate specificity, models of protein complexes will provide understanding into the combinatorial rules of cellular regulation and signaling.

  1. Structural elucidation of transmembrane domain zero (TMD0) of EcdL: A multidrug resistance-associated protein (MRP) family of ATP-binding cassette transporter protein revealed by atomistic simulation.

    Science.gov (United States)

    Bera, Krishnendu; Rani, Priyanka; Kishor, Gaurav; Agarwal, Shikha; Kumar, Antresh; Singh, Durg Vijay

    2017-09-20

    ATP-Binding cassette (ABC) transporters play an extensive role in the translocation of diverse sets of biologically important molecules across membrane. EchnocandinB (antifungal) and EcdL protein of Aspergillus rugulosus are encoded by the same cluster of genes. Co-expression of EcdL and echinocandinB reflects tightly linked biological functions. EcdL belongs to Multidrug Resistance associated Protein (MRP) subfamily of ABC transporters with an extra transmembrane domain zero (TMD0). Complete structure of MRP subfamily comprising of TMD0 domain, at atomic resolution is not known. We hypothesized that the transportation of echonocandinB is mediated via EcdL protein. Henceforth, it is pertinent to know the topological arrangement of TMD0, with other domains of protein and its possible role in transportation of echinocandinB. Absence of effective template for TMD0 domain lead us to model by I-TASSER, further structure has been refined by multiple template modelling using homologous templates of remaining domains (TMD1, NBD1, TMD2, NBD2). The modelled structure has been validated for packing, folding and stereochemical properties. MD simulation for 0.1 μs has been carried out in the biphasic environment for refinement of modelled protein. Non-redundant structures have been excavated by clustering of MD trajectory. The structural alignment of modelled structure has shown Z-score -37.9; 31.6, 31.5 with RMSD; 2.4, 4.2, 4.8 with ABC transporters; PDB ID 4F4C, 4M1 M, 4M2T, respectively, reflecting the correctness of structure. EchinocandinB has been docked to the modelled as well as to the clustered structures, which reveals interaction of echinocandinB with TMD0 and other TM helices in the translocation path build of TMDs.

  2. Solution structure and dynamics of melanoma inhibitory activity protein

    International Nuclear Information System (INIS)

    Lougheed, Julie C.; Domaille, Peter J.; Handel, Tracy M.

    2002-01-01

    Melanoma inhibitory activity (MIA) is a small secreted protein that is implicated in cartilage cell maintenance and melanoma metastasis. It is representative of a recently discovered family of proteins that contain a Src Homologous 3 (SH3) subdomain. While SH3 domains are normally found in intracellular proteins and mediate protein-protein interactions via recognition of polyproline helices, MIA is single-domain extracellular protein, and it probably binds to a different class of ligands.Here we report the assignments, solution structure, and dynamics of human MIA determined by heteronuclear NMR methods. The structures were calculated in a semi-automated manner without manual assignment of NOE crosspeaks, and have a backbone rmsd of 0.38 A over the ordered regions of the protein. The structure consists of an SH3-like subdomain with N- and C-terminal extensions of approximately 20 amino acids each that together form a novel fold. The rmsd between the solution structure and our recently reported crystal structure is 0.86 A over the ordered regions of the backbone, and the main differences are localized to the most dynamic regions of the protein. The similarity between the NMR and crystal structures supports the use of automated NOE assignments and ambiguous restraints to accelerate the calculation of NMR structures

  3. Can molecular dynamics simulations help in discriminating correct from erroneous protein 3D models?

    Directory of Open Access Journals (Sweden)

    Gibrat Jean-François

    2008-01-01

    Full Text Available Abstract Background Recent approaches for predicting the three-dimensional (3D structure of proteins such as de novo or fold recognition methods mostly rely on simplified energy potential functions and a reduced representation of the polypeptide chain. These simplifications facilitate the exploration of the protein conformational space but do not permit to capture entirely the subtle relationship that exists between the amino acid sequence and its native structure. It has been proposed that physics-based energy functions together with techniques for sampling the conformational space, e.g., Monte Carlo or molecular dynamics (MD simulations, are better suited to the task of modelling proteins at higher resolutions than those of models obtained with the former type of methods. In this study we monitor different protein structural properties along MD trajectories to discriminate correct from erroneous models. These models are based on the sequence-structure alignments provided by our fold recognition method, FROST. We define correct models as being built from alignments of sequences with structures similar to their native structures and erroneous models from alignments of sequences with structures unrelated to their native structures. Results For three test sequences whose native structures belong to the all-α, all-β and αβ classes we built a set of models intended to cover the whole spectrum: from a perfect model, i.e., the native structure, to a very poor model, i.e., a random alignment of the test sequence with a structure belonging to another structural class, including several intermediate models based on fold recognition alignments. We submitted these models to 11 ns of MD simulations at three different temperatures. We monitored along the corresponding trajectories the mean of the Root-Mean-Square deviations (RMSd with respect to the initial conformation, the RMSd fluctuations, the number of conformation clusters, the evolution of

  4. Markov State Models Reveal a Two-Step Mechanism of miRNA Loading into the Human Argonaute Protein: Selective Binding followed by Structural Re-arrangement

    KAUST Repository

    Jiang, Hanlun

    2015-07-16

    Argonaute (Ago) proteins and microRNAs (miRNAs) are central components in RNA interference, which is a key cellular mechanism for sequence-specific gene silencing. Despite intensive studies, molecular mechanisms of how Ago recognizes miRNA remain largely elusive. In this study, we propose a two-step mechanism for this molecular recognition: selective binding followed by structural re-arrangement. Our model is based on the results of a combination of Markov State Models (MSMs), large-scale protein-RNA docking, and molecular dynamics (MD) simulations. Using MSMs, we identify an open state of apo human Ago-2 in fast equilibrium with partially open and closed states. Conformations in this open state are distinguished by their largely exposed binding grooves that can geometrically accommodate miRNA as indicated in our protein-RNA docking studies. miRNA may then selectively bind to these open conformations. Upon the initial binding, the complex may perform further structural re-arrangement as shown in our MD simulations and eventually reach the stable binary complex structure. Our results provide novel insights in Ago-miRNA recognition mechanisms and our methodology holds great potential to be widely applied in the studies of other important molecular recognition systems.

  5. Markov State Models Reveal a Two-Step Mechanism of miRNA Loading into the Human Argonaute Protein: Selective Binding followed by Structural Re-arrangement

    KAUST Repository

    Jiang, Hanlun; Sheong, Fu Kit; Zhu, Lizhe; Gao, Xin; Bernauer, Julie; Huang, Xuhui

    2015-01-01

    Argonaute (Ago) proteins and microRNAs (miRNAs) are central components in RNA interference, which is a key cellular mechanism for sequence-specific gene silencing. Despite intensive studies, molecular mechanisms of how Ago recognizes miRNA remain largely elusive. In this study, we propose a two-step mechanism for this molecular recognition: selective binding followed by structural re-arrangement. Our model is based on the results of a combination of Markov State Models (MSMs), large-scale protein-RNA docking, and molecular dynamics (MD) simulations. Using MSMs, we identify an open state of apo human Ago-2 in fast equilibrium with partially open and closed states. Conformations in this open state are distinguished by their largely exposed binding grooves that can geometrically accommodate miRNA as indicated in our protein-RNA docking studies. miRNA may then selectively bind to these open conformations. Upon the initial binding, the complex may perform further structural re-arrangement as shown in our MD simulations and eventually reach the stable binary complex structure. Our results provide novel insights in Ago-miRNA recognition mechanisms and our methodology holds great potential to be widely applied in the studies of other important molecular recognition systems.

  6. Domain analyses of Usher syndrome causing Clarin-1 and GPR98 protein models.

    Science.gov (United States)

    Khan, Sehrish Haider; Javed, Muhammad Rizwan; Qasim, Muhammad; Shahzadi, Samar; Jalil, Asma; Rehman, Shahid Ur

    2014-01-01

    Usher syndrome is an autosomal recessive disorder that causes hearing loss, Retinitis Pigmentosa (RP) and vestibular dysfunction. It is clinically and genetically heterogeneous disorder which is clinically divided into three types i.e. type I, type II and type III. To date, there are about twelve loci and ten identified genes which are associated with Usher syndrome. A mutation in any of these genes e.g. CDH23, CLRN1, GPR98, MYO7A, PCDH15, USH1C, USH1G, USH2A and DFNB31 can result in Usher syndrome or non-syndromic deafness. These genes provide instructions for making proteins that play important roles in normal hearing, balance and vision. Studies have shown that protein structures of only seven genes have been determined experimentally and there are still three genes whose structures are unavailable. These genes are Clarin-1, GPR98 and Usherin. In the absence of an experimentally determined structure, homology modeling and threading often provide a useful 3D model of a protein. Therefore in the current study Clarin-1 and GPR98 proteins have been analyzed for signal peptide, domains and motifs. Clarin-1 protein was found to be without any signal peptide and consists of prokar lipoprotein domain. Clarin-1 is classified within claudin 2 super family and consists of twelve motifs. Whereas, GPR98 has a 29 amino acids long signal peptide and classified within GPCR family 2 having Concanavalin A-like lectin/glucanase superfamily. It was found to be consists of GPS and G protein receptor F2 domains and twenty nine motifs. Their 3D structures have been predicted using I-TASSER server. The model of Clarin-1 showed only α-helix but no beta sheets while model of GPR98 showed both α-helix and β sheets. The predicted structures were then evaluated and validated by MolProbity and Ramachandran plot. The evaluation of the predicted structures showed 78.9% residues of Clarin-1 and 78.9% residues of GPR98 within favored regions. The findings of present study has resulted in the

  7. Improved protein surface comparison and application to low-resolution protein structure data

    Directory of Open Access Journals (Sweden)

    Kihara Daisuke

    2010-12-01

    Full Text Available Abstract Background Recent advancements of experimental techniques for determining protein tertiary structures raise significant challenges for protein bioinformatics. With the number of known structures of unknown function expanding at a rapid pace, an urgent task is to provide reliable clues to their biological function on a large scale. Conventional approaches for structure comparison are not suitable for a real-time database search due to their slow speed. Moreover, a new challenge has arisen from recent techniques such as electron microscopy (EM, which provide low-resolution structure data. Previously, we have introduced a method for protein surface shape representation using the 3D Zernike descriptors (3DZDs. The 3DZD enables fast structure database searches, taking advantage of its rotation invariance and compact representation. The search results of protein surface represented with the 3DZD has showngood agreement with the existing structure classifications, but some discrepancies were also observed. Results The three new surface representations of backbone atoms, originally devised all-atom-surface representation, and the combination of all-atom surface with the backbone representation are examined. All representations are encoded with the 3DZD. Also, we have investigated the applicability of the 3DZD for searching protein EM density maps of varying resolutions. The surface representations are evaluated on structure retrieval using two existing classifications, SCOP and the CE-based classification. Conclusions Overall, the 3DZDs representing backbone atoms show better retrieval performance than the original all-atom surface representation. The performance further improved when the two representations are combined. Moreover, we observed that the 3DZD is also powerful in comparing low-resolution structures obtained by electron microscopy.

  8. Improved protein surface comparison and application to low-resolution protein structure data.

    Science.gov (United States)

    Sael, Lee; Kihara, Daisuke

    2010-12-14

    Recent advancements of experimental techniques for determining protein tertiary structures raise significant challenges for protein bioinformatics. With the number of known structures of unknown function expanding at a rapid pace, an urgent task is to provide reliable clues to their biological function on a large scale. Conventional approaches for structure comparison are not suitable for a real-time database search due to their slow speed. Moreover, a new challenge has arisen from recent techniques such as electron microscopy (EM), which provide low-resolution structure data. Previously, we have introduced a method for protein surface shape representation using the 3D Zernike descriptors (3DZDs). The 3DZD enables fast structure database searches, taking advantage of its rotation invariance and compact representation. The search results of protein surface represented with the 3DZD has showngood agreement with the existing structure classifications, but some discrepancies were also observed. The three new surface representations of backbone atoms, originally devised all-atom-surface representation, and the combination of all-atom surface with the backbone representation are examined. All representations are encoded with the 3DZD. Also, we have investigated the applicability of the 3DZD for searching protein EM density maps of varying resolutions. The surface representations are evaluated on structure retrieval using two existing classifications, SCOP and the CE-based classification. Overall, the 3DZDs representing backbone atoms show better retrieval performance than the original all-atom surface representation. The performance further improved when the two representations are combined. Moreover, we observed that the 3DZD is also powerful in comparing low-resolution structures obtained by electron microscopy.

  9. Studying Membrane Protein Structure and Function Using Nanodiscs

    DEFF Research Database (Denmark)

    Huda, Pie

    The structure and dynamic of membrane proteins can provide valuable information about general functions, diseases and effects of various drugs. Studying membrane proteins are a challenge as an amphiphilic environment is necessary to stabilise the protein in a functionally and structurally relevant...... form. This is most typically achieved through the use of detergent based reconstitution systems. However, time and again such systems fail to provide a suitable environment causing aggregation and inactivation. Nanodiscs are self-assembled lipoproteins containing two membrane scaffold proteins...... and a lipid bilayer in defined nanometer size, which can act as a stabiliser for membrane proteins. This enables both functional and structural investigation of membrane proteins in a detergent free environment which is closer to the native situation. Understanding the self-assembly of nanodiscs is important...

  10. 3D bioprinting of structural proteins.

    Science.gov (United States)

    Włodarczyk-Biegun, Małgorzata K; Del Campo, Aránzazu

    2017-07-01

    3D bioprinting is a booming method to obtain scaffolds of different materials with predesigned and customized morphologies and geometries. In this review we focus on the experimental strategies and recent achievements in the bioprinting of major structural proteins (collagen, silk, fibrin), as a particularly interesting technology to reconstruct the biochemical and biophysical composition and hierarchical morphology of natural scaffolds. The flexibility in molecular design offered by structural proteins, combined with the flexibility in mixing, deposition, and mechanical processing inherent to bioprinting technologies, enables the fabrication of highly functional scaffolds and tissue mimics with a degree of complexity and organization which has only just started to be explored. Here we describe the printing parameters and physical (mechanical) properties of bioinks based on structural proteins, including the biological function of the printed scaffolds. We describe applied printing techniques and cross-linking methods, highlighting the modifications implemented to improve scaffold properties. The used cell types, cell viability, and possible construct applications are also reported. We envision that the application of printing technologies to structural proteins will enable unprecedented control over their supramolecular organization, conferring printed scaffolds biological properties and functions close to natural systems. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Protein NMR Structures Refined with Rosetta Have Higher Accuracy Relative to Corresponding X-ray Crystal Structures

    Science.gov (United States)

    2014-01-01

    We have found that refinement of protein NMR structures using Rosetta with experimental NMR restraints yields more accurate protein NMR structures than those that have been deposited in the PDB using standard refinement protocols. Using 40 pairs of NMR and X-ray crystal structures determined by the Northeast Structural Genomics Consortium, for proteins ranging in size from 5–22 kDa, restrained Rosetta refined structures fit better to the raw experimental data, are in better agreement with their X-ray counterparts, and have better phasing power compared to conventionally determined NMR structures. For 37 proteins for which NMR ensembles were available and which had similar structures in solution and in the crystal, all of the restrained Rosetta refined NMR structures were sufficiently accurate to be used for solving the corresponding X-ray crystal structures by molecular replacement. The protocol for restrained refinement of protein NMR structures was also compared with restrained CS-Rosetta calculations. For proteins smaller than 10 kDa, restrained CS-Rosetta, starting from extended conformations, provides slightly more accurate structures, while for proteins in the size range of 10–25 kDa the less CPU intensive restrained Rosetta refinement protocols provided equally or more accurate structures. The restrained Rosetta protocols described here can improve the accuracy of protein NMR structures and should find broad and general for studies of protein structure and function. PMID:24392845

  12. Atomic-accuracy prediction of protein loop structures through an RNA-inspired Ansatz.

    Directory of Open Access Journals (Sweden)

    Rhiju Das

    Full Text Available Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth 'RNA-puzzle' competition. These results establish all-atom enumeration as

  13. INTEGRATING GENETIC AND STRUCTURAL DATA ON HUMAN PROTEIN KINOME IN NETWORK-BASED MODELING OF KINASE SENSITIVITIES AND RESISTANCE TO TARGETED AND PERSONALIZED ANTICANCER DRUGS.

    Science.gov (United States)

    Verkhivker, Gennady M

    2016-01-01

    The human protein kinome presents one of the largest protein families that orchestrate functional processes in complex cellular networks, and when perturbed, can cause various cancers. The abundance and diversity of genetic, structural, and biochemical data underlies the complexity of mechanisms by which targeted and personalized drugs can combat mutational profiles in protein kinases. Coupled with the evolution of system biology approaches, genomic and proteomic technologies are rapidly identifying and charactering novel resistance mechanisms with the goal to inform rationale design of personalized kinase drugs. Integration of experimental and computational approaches can help to bring these data into a unified conceptual framework and develop robust models for predicting the clinical drug resistance. In the current study, we employ a battery of synergistic computational approaches that integrate genetic, evolutionary, biochemical, and structural data to characterize the effect of cancer mutations in protein kinases. We provide a detailed structural classification and analysis of genetic signatures associated with oncogenic mutations. By integrating genetic and structural data, we employ network modeling to dissect mechanisms of kinase drug sensitivities to oncogenic EGFR mutations. Using biophysical simulations and analysis of protein structure networks, we show that conformational-specific drug binding of Lapatinib may elicit resistant mutations in the EGFR kinase that are linked with the ligand-mediated changes in the residue interaction networks and global network properties of key residues that are responsible for structural stability of specific functional states. A strong network dependency on high centrality residues in the conformation-specific Lapatinib-EGFR complex may explain vulnerability of drug binding to a broad spectrum of mutations and the emergence of drug resistance. Our study offers a systems-based perspective on drug design by unravelling

  14. Structure prediction and binding sites analysis of curcin protein of Jatropha curcas using computational approaches.

    Science.gov (United States)

    Srivastava, Mugdha; Gupta, Shishir K; Abhilash, P C; Singh, Nandita

    2012-07-01

    Ribosome inactivating proteins (RIPs) are defense proteins in a number of higher-plant species that are directly targeted toward herbivores. Jatropha curcas is one of the biodiesel plants having RIPs. The Jatropha seed meal, after extraction of oil, is rich in curcin, a highly toxic RIP similar to ricin, which makes it unsuitable for animal feed. Although the toxicity of curcin is well documented in the literature, the detailed toxic properties and the 3D structure of curcin has not been determined by X-ray crystallography, NMR spectroscopy or any in silico techniques to date. In this pursuit, the structure of curcin was modeled by a composite approach of 3D structure prediction using threading and ab initio modeling. Assessment of model quality was assessed by methods which include Ramachandran plot analysis and Qmean score estimation. Further, we applied the protein-ligand docking approach to identify the r-RNA binding residue of curcin. The present work provides the first structural insight into the binding mode of r-RNA adenine to the curcin protein and forms the basis for designing future inhibitors of curcin. Cloning of a future peptide inhibitor within J. curcas can produce non-toxic varieties of J. curcas, which would make the seed-cake suitable as animal feed without curcin detoxification.

  15. Proteome scale identification, classification and structural analysis of iron-binding proteins in bread wheat.

    Science.gov (United States)

    Verma, Shailender Kumar; Sharma, Ankita; Sandhu, Padmani; Choudhary, Neha; Sharma, Shailaja; Acharya, Vishal; Akhter, Yusuf

    2017-05-01

    Bread wheat is one of the major staple foods of worldwide population and iron plays a significant role in growth and development of the plant. In this report, we are presenting the genome wide identification of iron-binding proteins in bread wheat. The wheat genome derived putative proteome was screened for identification of iron-binding sequence motifs. Out of 602 putative iron-binding proteins, 130 were able to produce reliable structural models by homology techniques and further analyzed for the presence of iron-binding structural motifs. The computationally identified proteins appear to bind to ferrous and ferric ions and showed diverse coordination geometries. Glu, His, Asp and Cys amino acid residues were found to be mostly involved in iron binding. We have classified these proteins on the basis of their localization in the different cellular compartments. The identified proteins were further classified into their protein folds, families and functional classes ranging from structure maintenance of cellular components, regulation of gene expression, post translational modification, membrane proteins, enzymes, signaling and storage proteins. This comprehensive report regarding structural iron binding proteome provides useful insights into the diversity of iron binding proteins of wheat plants and further utilized to study their roles in plant growth, development and physiology. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. ProteinSplit: splitting of multi-domain proteins using prediction of ordered and disordered regions in protein sequences for virtual structural genomics

    International Nuclear Information System (INIS)

    Wyrwicz, Lucjan S; Koczyk, Grzegorz; Rychlewski, Leszek; Plewczynski, Dariusz

    2007-01-01

    The annotation of protein folds within newly sequenced genomes is the main target for semi-automated protein structure prediction (virtual structural genomics). A large number of automated methods have been developed recently with very good results in the case of single-domain proteins. Unfortunately, most of these automated methods often fail to properly predict the distant homology between a given multi-domain protein query and structural templates. Therefore a multi-domain protein should be split into domains in order to overcome this limitation. ProteinSplit is designed to identify protein domain boundaries using a novel algorithm that predicts disordered regions in protein sequences. The software utilizes various sequence characteristics to assess the local propensity of a protein to be disordered or ordered in terms of local structure stability. These disordered parts of a protein are likely to create interdomain spacers. Because of its speed and portability, the method was successfully applied to several genome-wide fold annotation experiments. The user can run an automated analysis of sets of proteins or perform semi-automated multiple user projects (saving the results on the server). Additionally the sequences of predicted domains can be sent to the Bioinfo.PL Protein Structure Prediction Meta-Server for further protein three-dimensional structure and function prediction. The program is freely accessible as a web service at http://lucjan.bioinfo.pl/proteinsplit together with detailed benchmark results on the critical assessment of a fully automated structure prediction (CAFASP) set of sequences. The source code of the local version of protein domain boundary prediction is available upon request from the authors

  17. Ab initio structure determination and refinement of a scorpion protein toxin.

    Science.gov (United States)

    Smith, G D; Blessing, R H; Ealick, S E; Fontecilla-Camps, J C; Hauptman, H A; Housset, D; Langs, D A; Miller, R

    1997-09-01

    The structure of toxin II from the scorpion Androctonus australis Hector has been determined ab initio by direct methods using SnB at 0.96 A resolution. For the purpose of this structure redetermination, undertaken as a test of the minimal function and the SnB program, the identity and sequence of the protein was withheld from part of the research team. A single solution obtained from 1 619 random atom trials was clearly revealed by the bimodal distribution of the final value of the minimal function associated with each individual trial. Five peptide fragments were identified from a conservative analysis of the initial E-map, and following several refinement cycles with X-PLOR, a model was built of the complete structure. At the end of the X-PLOR refinement, the sequence was compared with the published sequence and 57 of the 64 residues had been correctly identified. Two errors in sequence resulted from side chains with similar size while the rest of the errors were a result of severe disorder or high thermal motion in the side chains. Given the amino-acid sequence, it is estimated that the initial E-map could have produced a model containing 99% of all main-chain and 81% of side-chain atoms. The structure refinement was completed with PROFFT, including the contributions of protein H atoms, and converged at a residual of 0.158 for 30 609 data with F >or= 2sigma(F) in the resolution range 8.0-0.964 A. The final model consisted of 518 non-H protein atoms (36 disordered), 407 H atoms, and 129 water molecules (43 with occupancies less than unity). This total of 647 non-H atoms represents the largest light-atom structure solved to date.

  18. 3D complex: a structural classification of protein complexes.

    Directory of Open Access Journals (Sweden)

    Emmanuel D Levy

    2006-11-01

    Full Text Available Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes.

  19. Identification of similar regions of protein structures using integrated sequence and structure analysis tools

    Directory of Open Access Journals (Sweden)

    Heiland Randy

    2006-03-01

    Full Text Available Abstract Background Understanding protein function from its structure is a challenging problem. Sequence based approaches for finding homology have broad use for annotation of both structure and function. 3D structural information of protein domains and their interactions provide a complementary view to structure function relationships to sequence information. We have developed a web site http://www.sblest.org/ and an API of web services that enables users to submit protein structures and identify statistically significant neighbors and the underlying structural environments that make that match using a suite of sequence and structure analysis tools. To do this, we have integrated S-BLEST, PSI-BLAST and HMMer based superfamily predictions to give a unique integrated view to prediction of SCOP superfamilies, EC number, and GO term, as well as identification of the protein structural environments that are associated with that prediction. Additionally, we have extended UCSF Chimera and PyMOL to support our web services, so that users can characterize their own proteins of interest. Results Users are able to submit their own queries or use a structure already in the PDB. Currently the databases that a user can query include the popular structural datasets ASTRAL 40 v1.69, ASTRAL 95 v1.69, CLUSTER50, CLUSTER70 and CLUSTER90 and PDBSELECT25. The results can be downloaded directly from the site and include function prediction, analysis of the most conserved environments and automated annotation of query proteins. These results reflect both the hits found with PSI-BLAST, HMMer and with S-BLEST. We have evaluated how well annotation transfer can be performed on SCOP ID's, Gene Ontology (GO ID's and EC Numbers. The method is very efficient and totally automated, generally taking around fifteen minutes for a 400 residue protein. Conclusion With structural genomics initiatives determining structures with little, if any, functional characterization

  20. Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models

    Directory of Open Access Journals (Sweden)

    Stovgaard Kasper

    2010-08-01

    Full Text Available Abstract Background Genome sequencing projects have expanded the gap between the amount of known protein sequences and structures. The limitations of current high resolution structure determination methods make it unlikely that this gap will disappear in the near future. Small angle X-ray scattering (SAXS is an established low resolution method for routinely determining the structure of proteins in solution. The purpose of this study is to develop a method for the efficient calculation of accurate SAXS curves from coarse-grained protein models. Such a method can for example be used to construct a likelihood function, which is paramount for structure determination based on statistical inference. Results We present a method for the efficient calculation of accurate SAXS curves based on the Debye formula and a set of scattering form factors for dummy atom representations of amino acids. Such a method avoids the computationally costly iteration over all atoms. We estimated the form factors using generated data from a set of high quality protein structures. No ad hoc scaling or correction factors are applied in the calculation of the curves. Two coarse-grained representations of protein structure were investigated; two scattering bodies per amino acid led to significantly better results than a single scattering body. Conclusion We show that the obtained point estimates allow the calculation of accurate SAXS curves from coarse-grained protein models. The resulting curves are on par with the current state-of-the-art program CRYSOL, which requires full atomic detail. Our method was also comparable to CRYSOL in recognizing native structures among native-like decoys. As a proof-of-concept, we combined the coarse-grained Debye calculation with a previously described probabilistic model of protein structure, TorusDBN. This resulted in a significant improvement in the decoy recognition performance. In conclusion, the presented method shows great promise for

  1. From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction.

    Science.gov (United States)

    Akhter, Nasrin; Shehu, Amarda

    2018-01-19

    Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.

  2. From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction

    Directory of Open Access Journals (Sweden)

    Nasrin Akhter

    2018-01-01

    Full Text Available Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.

  3. PPM-One: a static protein structure based chemical shift predictor

    International Nuclear Information System (INIS)

    Li, Dawei; Brüschweiler, Rafael

    2015-01-01

    We mined the most recent editions of the BioMagResDataBank and the protein data bank to parametrize a new empirical knowledge-based chemical shift predictor of protein backbone atoms using either a linear or an artificial neural network model. The resulting chemical shift predictor PPM-One accepts a single static 3D structure as input and emulates the effect of local protein dynamics via interatomic steric contacts. Furthermore, the chemical shift prediction was extended to most side-chain protons and it is found that the prediction accuracy is at a level allowing an independent assessment of stereospecific assignments. For a previously established set of test proteins some overall improvement was achieved over current top-performing chemical shift prediction programs

  4. Deprotonated imidodiphosphate in AMPPNP-containing protein structures

    International Nuclear Information System (INIS)

    Dauter, Miroslawa; Dauter, Zbigniew

    2011-01-01

    In certain AMPPNP-containing protein structures, the nitrogen bridging the two terminal phosphate groups can be deprotonated. Many different proteins utilize the chemical energy provided by the cofactor adenosine triphosphate (ATP) for their proper function. A number of structures in the Protein Data Bank (PDB) contain adenosine 5′-(β,γ-imido)triphosphate (AMPPNP), a nonhydrolysable analog of ATP in which the bridging O atom between the two terminal phosphate groups is substituted by the imido function. Under mild conditions imides do not have acidic properties and thus the imide nitrogen should be protonated. However, an analysis of protein structures containing AMPPNP reveals that the imide group is deprotonated in certain complexes if the negative charges of the phosphate moieties in AMPPNP are in part neutralized by coordinating divalent metals or a guanidinium group of an arginine

  5. Biochemical characterization and structural modeling of human cathepsin E variant 2 in comparison to the wild-type protein

    Science.gov (United States)

    Puizdar, Vida; Zajc, Tajana; Žerovnik, Eva; Renko, Miha; Pieper, Ursula; Eswar, Narayanan; Šali, Andrej; Dolenc, Iztok; Turk, Vito

    2014-01-01

    Cathepsin E splice variant 2 appears in a number of gastric carcinoma. Here, we report detecting this variant in HeLa cells using polyclonal antibodies and biotinylated inhibitor pepstatin A. An overexpression of GFP fusion proteins of cathepsin E and its splice variant within HEK-293T cells was performed to show their localization. Their distribution under a fluorescence microscope showed that they are colocalized. We also expressed variant 1 and variant 2 of cathepsins E, with propeptide and without it, in Echerichia coli. After refolding from the inclusion bodies, the enzymatic activity and circular dichroism spectra of the splice variant 2 were compared to those of the wild-type mature active cathepsins E. While full-length cathepsin E variant1 is activated at acid pH, the splice variant remains inactive. In contrast to the active cathepsin E, the splice variant 2 predominantly assumes β-sheet structure, prone to oligomerization, at least under in vitro conditions, as shown by Atomic Force Microscopy as shallow disk-like particles. A comparative structure model of splice variant 2 was computed based on its alignment to the known structure of cathepsin E intermediate (Protein Data Bank code 1TZS), and used to rationalize its conformational properties and loss of activity. PMID:22718633

  6. Dengue Virus Non-structural Protein 1 Modulates Infectious Particle Production via Interaction with the Structural Proteins.

    Directory of Open Access Journals (Sweden)

    Pietro Scaturro

    Full Text Available Non-structural protein 1 (NS1 is one of the most enigmatic proteins of the Dengue virus (DENV, playing distinct functions in immune evasion, pathogenesis and viral replication. The recently reported crystal structure of DENV NS1 revealed its peculiar three-dimensional fold; however, detailed information on NS1 function at different steps of the viral replication cycle is still missing. By using the recently reported crystal structure, as well as amino acid sequence conservation, as a guide for a comprehensive site-directed mutagenesis study, we discovered that in addition to being essential for RNA replication, DENV NS1 is also critically required for the production of infectious virus particles. Taking advantage of a trans-complementation approach based on fully functional epitope-tagged NS1 variants, we identified previously unreported interactions between NS1 and the structural proteins Envelope (E and precursor Membrane (prM. Interestingly, coimmunoprecipitation revealed an additional association with capsid, arguing that NS1 interacts via the structural glycoproteins with DENV particles. Results obtained with mutations residing either in the NS1 Wing domain or in the β-ladder domain suggest that NS1 might have two distinct functions in the assembly of DENV particles. By using a trans-complementation approach with a C-terminally KDEL-tagged ER-resident NS1, we demonstrate that the secretion of NS1 is dispensable for both RNA replication and infectious particle production. In conclusion, our results provide an extensive genetic map of NS1 determinants essential for viral RNA replication and identify a novel role of NS1 in virion production that is mediated via interaction with the structural proteins. These studies extend the list of NS1 functions and argue for a central role in coordinating replication and assembly/release of infectious DENV particles.

  7. Predicting protein complexes using a supervised learning method combined with local structural information.

    Science.gov (United States)

    Dong, Yadong; Sun, Yongqi; Qin, Chao

    2018-01-01

    The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.

  8. The PMDB Protein Model Database

    Science.gov (United States)

    Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

    2006-01-01

    The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873

  9. A pairwise residue contact area-based mean force potential for discrimination of native protein structure

    Directory of Open Access Journals (Sweden)

    Pezeshk Hamid

    2010-01-01

    Full Text Available Abstract Background Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. Results We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. Conclusions This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield

  10. Criteria to Extract High-Quality Protein Data Bank Subsets for Structure Users.

    Science.gov (United States)

    Carugo, Oliviero; Djinović-Carugo, Kristina

    2016-01-01

    It is often necessary to build subsets of the Protein Data Bank to extract structural trends and average values. For this purpose it is mandatory that the subsets are non-redundant and of high quality. The first problem can be solved relatively easily at the sequence level or at the structural level. The second, on the contrary, needs special attention. It is not sufficient, in fact, to consider the crystallographic resolution and other feature must be taken into account: the absence of strings of residues from the electron density maps and from the files deposited in the Protein Data Bank; the B-factor values; the appropriate validation of the structural models; the quality of the electron density maps, which is not uniform; and the temperature of the diffraction experiments. More stringent criteria produce smaller subsets, which can be enlarged with more tolerant selection criteria. The incessant growth of the Protein Data Bank and especially of the number of high-resolution structures is allowing the use of more stringent selection criteria, with a consequent improvement of the quality of the subsets of the Protein Data Bank.

  11. Prediction of protein-protein interactions in dengue virus coat proteins guided by low resolution cryoEM structures

    Directory of Open Access Journals (Sweden)

    Srinivasan Narayanaswamy

    2010-06-01

    Full Text Available Abstract Background Dengue virus along with the other members of the flaviviridae family has reemerged as deadly human pathogens. Understanding the mechanistic details of these infections can be highly rewarding in developing effective antivirals. During maturation of the virus inside the host cell, the coat proteins E and M undergo conformational changes, altering the morphology of the viral coat. However, due to low resolution nature of the available 3-D structures of viral assemblies, the atomic details of these changes are still elusive. Results In the present analysis, starting from Cα positions of low resolution cryo electron microscopic structures the residue level details of protein-protein interaction interfaces of dengue virus coat proteins have been predicted. By comparing the preexisting structures of virus in different phases of life cycle, the changes taking place in these predicted protein-protein interaction interfaces were followed as a function of maturation process of the virus. Besides changing the current notion about the presence of only homodimers in the mature viral coat, the present analysis indicated presence of a proline-rich motif at the protein-protein interaction interface of the coat protein. Investigating the conservation status of these seemingly functionally crucial residues across other members of flaviviridae family enabled dissecting common mechanisms used for infections by these viruses. Conclusions Thus, using computational approach the present analysis has provided better insights into the preexisting low resolution structures of virus assemblies, the findings of which can be made use of in designing effective antivirals against these deadly human pathogens.

  12. Fragment-based modelling of single stranded RNA bound to RNA recognition motif containing proteins

    Science.gov (United States)

    de Beauchene, Isaure Chauvot; de Vries, Sjoerd J.; Zacharias, Martin

    2016-01-01

    Abstract Protein-RNA complexes are important for many biological processes. However, structural modeling of such complexes is hampered by the high flexibility of RNA. Particularly challenging is the docking of single-stranded RNA (ssRNA). We have developed a fragment-based approach to model the structure of ssRNA bound to a protein, based on only the protein structure, the RNA sequence and conserved contacts. The conformational diversity of each RNA fragment is sampled by an exhaustive library of trinucleotides extracted from all known experimental protein–RNA complexes. The method was applied to ssRNA with up to 12 nucleotides which bind to dimers of the RNA recognition motifs (RRMs), a highly abundant eukaryotic RNA-binding domain. The fragment based docking allows a precise de novo atomic modeling of protein-bound ssRNA chains. On a benchmark of seven experimental ssRNA–RRM complexes, near-native models (with a mean heavy-atom deviation of <3 Å from experiment) were generated for six out of seven bound RNA chains, and even more precise models (deviation < 2 Å) were obtained for five out of seven cases, a significant improvement compared to the state of the art. The method is not restricted to RRMs but was also successfully applied to Pumilio RNA binding proteins. PMID:27131381

  13. CABS-flex 2.0: a web server for fast simulations of flexibility of protein structures.

    Science.gov (United States)

    Kuriata, Aleksander; Gierut, Aleksandra Maria; Oleniecki, Tymoteusz; Ciemny, Maciej Pawel; Kolinski, Andrzej; Kurcinski, Mateusz; Kmiecik, Sebastian

    2018-05-14

    Classical simulations of protein flexibility remain computationally expensive, especially for large proteins. A few years ago, we developed a fast method for predicting protein structure fluctuations that uses a single protein model as the input. The method has been made available as the CABS-flex web server and applied in numerous studies of protein structure-function relationships. Here, we present a major update of the CABS-flex web server to version 2.0. The new features include: extension of the method to significantly larger and multimeric proteins, customizable distance restraints and simulation parameters, contact maps and a new, enhanced web server interface. CABS-flex 2.0 is freely available at http://biocomp.chem.uw.edu.pl/CABSflex2.

  14. Nuclear Magnetic Resonance structural studies of peptides and proteins from the vaso-regulatory System

    International Nuclear Information System (INIS)

    Sizun, Philippe

    1991-01-01

    The aim of the present work is to show how Nuclear Magnetic Resonance (NMR) allows to determine the 3D structure of peptides and proteins in solution. A comparative study of peptides involved in the vaso-regulatory System (form small hormonal peptide to the 65 amido-acid protein hirudin) has allowed to design most efficient NMR 1D and 2D strategies. It rapidly appeared that the size of the peptide plays a key role in the structuration of the molecule, smallest peptides being weakly structured owing to the lack of cooperative effects. As the molecular size increases or if conformational locks are present (disulfide bridges) the probability of stable secondary structure increases. For the protein hirudin, a combination of ail available NMR parameters deduced form dedicated experiments (chemical shifts, coupling constants, overhauser effects, accessibility of amide protons) and molecular modelling under constraints allows a clear 3D structure to be proposed for this protein in solution. Finally, a comparative study of the experimental structures and of those deduced form prediction rules has shed light on the concept of structural predisposition, the latter being of high value for a better understanding of structure-activity relationships. (author) [fr

  15. On the characterization and software implementation of general protein lattice models.

    Directory of Open Access Journals (Sweden)

    Alessio Bechini

    Full Text Available models of proteins have been widely used as a practical means to computationally investigate general properties of the system. In lattice models any sterically feasible conformation is represented as a self-avoiding walk on a lattice, and residue types are limited in number. So far, only two- or three-dimensional lattices have been used. The inspection of the neighborhood of alpha carbons in the core of real proteins reveals that also lattices with higher coordination numbers, possibly in higher dimensional spaces, can be adopted. In this paper, a new general parametric lattice model for simplified protein conformations is proposed and investigated. It is shown how the supporting software can be consistently designed to let algorithms that operate on protein structures be implemented in a lattice-agnostic way. The necessary theoretical foundations are developed and organically presented, pinpointing the role of the concept of main directions in lattice-agnostic model handling. Subsequently, the model features across dimensions and lattice types are explored in tests performed on benchmark protein sequences, using a Python implementation. Simulations give insights on the use of square and triangular lattices in a range of dimensions. The trend of potential minimum for sequences of different lengths, varying the lattice dimension, is uncovered. Moreover, an extensive quantitative characterization of the usage of the so-called "move types" is reported for the first time. The proposed general framework for the development of lattice models is simple yet complete, and an object-oriented architecture can be proficiently employed for the supporting software, by designing ad-hoc classes. The proposed framework represents a new general viewpoint that potentially subsumes a number of solutions previously studied. The adoption of the described model pushes to look at protein structure issues from a more general and essential perspective, making

  16. Protein model discrimination using mutational sensitivity derived from deep sequencing.

    Science.gov (United States)

    Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan

    2012-02-08

    A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks.

    Science.gov (United States)

    Babaei, Sepideh; Geranmayeh, Amir; Seyyedsalehi, Seyyed Ali

    2010-12-01

    The supervised learning of recurrent neural networks well-suited for prediction of protein secondary structures from the underlying amino acids sequence is studied. Modular reciprocal recurrent neural networks (MRR-NN) are proposed to model the strong correlations between adjacent secondary structure elements. Besides, a multilayer bidirectional recurrent neural network (MBR-NN) is introduced to capture the long-range intramolecular interactions between amino acids in formation of the secondary structure. The final modular prediction system is devised based on the interactive integration of the MRR-NN and the MBR-NN structures to arbitrarily engage the neighboring effects of the secondary structure types concurrent with memorizing the sequential dependencies of amino acids along the protein chain. The advanced combined network augments the percentage accuracy (Q₃) to 79.36% and boosts the segment overlap (SOV) up to 70.09% when tested on the PSIPRED dataset in three-fold cross-validation. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  18. Tertiary alphabet for the observable protein structural universe.

    Science.gov (United States)

    Mackenzie, Craig O; Zhou, Jianfu; Grigoryan, Gevorg

    2016-11-22

    Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.

  19. Annotating the protein-RNA interaction sites in proteins using evolutionary information and protein backbone structure.

    Science.gov (United States)

    Li, Tao; Li, Qian-Zhong

    2012-11-07

    RNA-protein interactions play important roles in various biological processes. The precise detection of RNA-protein interaction sites is very important for understanding essential biological processes and annotating the function of the proteins. In this study, based on various features from amino acid sequence and structure, including evolutionary information, solvent accessible surface area and torsion angles (φ, ψ) in the backbone structure of the polypeptide chain, a computational method for predicting RNA-binding sites in proteins is proposed. When the method is applied to predict RNA-binding sites in three datasets: RBP86 containing 86 protein chains, RBP107 containing 107 proteins chains and RBP109 containing 109 proteins chains, better sensitivities and specificities are obtained compared to previously published methods in five-fold cross-validation tests. In order to make further examination for the efficiency of our method, the RBP107 dataset is used as training set, RBP86 and RBP109 datasets are used as the independent test sets. In addition, as examples of our prediction, RNA-binding sites in a few proteins are presented. The annotated results are consistent with the PDB annotation. These results show that our method is useful for annotating RNA binding sites of novel proteins.

  20. Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

    Science.gov (United States)

    Li, Yaohang; Rata, Ionel; Chiu, See-wing; Jakobsson, Eric

    2010-07-20

    Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

  1. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    Science.gov (United States)

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on

  2. Hydra meiosis reveals unexpected conservation of structural synaptonemal complex proteins across metazoans.

    Science.gov (United States)

    Fraune, Johanna; Alsheimer, Manfred; Volff, Jean-Nicolas; Busch, Karoline; Fraune, Sebastian; Bosch, Thomas C G; Benavente, Ricardo

    2012-10-09

    The synaptonemal complex (SC) is a key structure of meiosis, mediating the stable pairing (synapsis) of homologous chromosomes during prophase I. Its remarkable tripartite structure is evolutionarily well conserved and can be found in almost all sexually reproducing organisms. However, comparison of the different SC protein components in the common meiosis model organisms Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Mus musculus revealed no sequence homology. This discrepancy challenged the hypothesis that the SC arose only once in evolution. To pursue this matter we focused on the evolution of SYCP1 and SYCP3, the two major structural SC proteins of mammals. Remarkably, our comparative bioinformatic and expression studies revealed that SYCP1 and SYCP3 are also components of the SC in the basal metazoan Hydra. In contrast to previous assumptions, we therefore conclude that SYCP1 and SYCP3 form monophyletic groups of orthologous proteins across metazoans.

  3. Crystal structure of the Japanese encephalitis virus envelope protein.

    Science.gov (United States)

    Luca, Vincent C; AbiMansour, Jad; Nelson, Christopher A; Fremont, Daved H

    2012-02-01

    Japanese encephalitis virus (JEV) is the leading global cause of viral encephalitis. The JEV envelope protein (E) facilitates cellular attachment and membrane fusion and is the primary target of neutralizing antibodies. We have determined the 2.1-Å resolution crystal structure of the JEV E ectodomain refolded from bacterial inclusion bodies. The E protein possesses the three domains characteristic of flavivirus envelopes and epitope mapping of neutralizing antibodies onto the structure reveals determinants that correspond to the domain I lateral ridge, fusion loop, domain III lateral ridge, and domain I-II hinge. While monomeric in solution, JEV E assembles as an antiparallel dimer in the crystal lattice organized in a highly similar fashion as seen in cryo-electron microscopy models of mature flavivirus virions. The dimer interface, however, is remarkably small and lacks many of the domain II contacts observed in other flavivirus E homodimers. In addition, uniquely conserved histidines within the JEV serocomplex suggest that pH-mediated structural transitions may be aided by lateral interactions outside the dimer interface in the icosahedral virion. Our results suggest that variation in dimer structure and stability may significantly influence the assembly, receptor interaction, and uncoating of virions.

  4. Fibrous Protein Structures: Hierarchy, History and Heroes.

    Science.gov (United States)

    Squire, John M; Parry, David A D

    2017-01-01

    During the 1930s and 1940s the technique of X-ray diffraction was applied widely by William Astbury and his colleagues to a number of naturally-occurring fibrous materials. On the basis of the diffraction patterns obtained, he observed that the structure of each of the fibres was dominated by one of a small number of different types of molecular conformation. One group of fibres, known as the k-m-e-f group of proteins (keratin - myosin - epidermin - fibrinogen), gave rise to diffraction characteristics that became known as the α-pattern. Others, such as those from a number of silks, gave rise to a different pattern - the β-pattern, while connective tissues yielded a third unique set of diffraction characteristics. At the time of Astbury's work, the structures of these materials were unknown, though the spacings of the main X-ray reflections gave an idea of the axial repeats and the lateral packing distances. In a breakthrough in the early 1950s, the basic structures of all of these fibrous proteins were determined. It was found that the long protein chains, composed of strings of amino acids, could be folded up in a systematic manner to generate a limited number of structures that were consistent with the X-ray data. The most important of these were known as the α-helix, the β-sheet, and the collagen triple helix. These studies provided information about the basic building blocks of all proteins, both fibrous and globular. They did not, however, provide detailed information about how these molecules packed together in three-dimensions to generate the fibres found in vivo. A number of possible packing arrangements were subsequently deduced from the X-ray diffraction and other data, but it is only in the last few years, through the continued improvements of electron microscopy, that the packing details within some fibrous proteins can now be seen directly. Here we outline briefly some of the milestones in fibrous protein structure determination, the role of the

  5. Protein and Peptide Gas-phase Structure Investigation Using Collision Cross Section Measurements and Hydrogen Deuterium Exchange

    Science.gov (United States)

    Khakinejad, Mahdiar

    Protein and peptide gas-phase structure analysis provides the opportunity to study these species outside of their explicit environment where the interaction network with surrounding molecules makes the analysis difficult [1]. Although gas-phase structure analysis offers a unique opportunity to study the intrinsic behavior of these biomolecules [2-4], proteins and peptides exhibit very low vapor pressures [2]. Peptide and protein ions can be rendered in the gas-phase using electrospray ionization (ESI) [5]. There is a growing body of literature that shows proteins and peptides can maintain solution structures during the process of ESI and these structures can persist for a few hundred milliseconds [6-9]. Techniques for monitoring gas-phase protein and peptide ion structures are categorized as physical probes and chemical probes. Collision cross section (CCS) measurement, being a physical probe, is a powerful method to investigate gas-phase structure size [3, 7, 10-15]; however, CCS values alone do not establish a one to one relation with structure(i.e., the CCS value is an orientationally averaged value [15-18]. Here we propose the utility of gas-phase hydrogen deuterium exchange (HDX) as a second criterion of structure elucidation. The proposed approach incudes extensive MD simulations to sample biomolecular ion conformation space with the production of numerous, random in-silico structures. Subsequently a CCS can be calculated for these structures and theoretical CCS values are compared with experimental values to produce a pool of candidate structures. Utilizing a chemical reaction model based on the gas-phase HDX mechanism, the HDX kinetics behavior of these candidate structures are predicted and compared to experimental results to nominate the best in-silico structures which match (chemically and physically) with experimental observations. For the predictive approach to succeed, an extensive technique and method development is essential. To combine CCS

  6. Structural study of surfactant-dependent interaction with protein

    Energy Technology Data Exchange (ETDEWEB)

    Mehan, Sumit; Aswal, Vinod K., E-mail: vkaswal@barc.gov.in [Solid State Physics Division, Bhabha Atomic Research Centre, Mumbai 400 085 (India); Kohlbrecher, Joachim [Laboratory for Neutron Scattering, Paul Scherrer Institut, CH-5232 PSI Villigen (Switzerland)

    2015-06-24

    Small-angle neutron scattering (SANS) has been used to study the complex structure of anionic BSA protein with three different (cationic DTAB, anionic SDS and non-ionic C12E10) surfactants. These systems form very different surfactant-dependent complexes. We show that the structure of protein-surfactant complex is initiated by the site-specific electrostatic interaction between the components, followed by the hydrophobic interaction at high surfactant concentrations. It is also found that hydrophobic interaction is preferred over the electrostatic interaction in deciding the resultant structure of protein-surfactant complexes.

  7. Amino acid and structural variability of Yersinia pestis LcrV protein

    Energy Technology Data Exchange (ETDEWEB)

    Anisimov, A P; Dentovskaya, S V; Panfertsev, E A; Svetoch, T E; Kopylov, P K; Segelke, B W; Zemla, A; Telepnev, M V; Motin, V L

    2009-11-09

    The LcrV protein is a multifunctional virulence factor and protective antigen of the plague bacterium which is generally conserved between the epidemic strains of Yersinia pestis. They investigated the diversity in the LcrV sequences among non-epidemic Y. pestis strains which have a limited virulence in selected animal models and for humans. Sequencing of lcrV genes from ten Y. pestis strains belonging to different phylogenetic groups (subspecies) showed that the LcrV proteins possess four major variable hotspots at positions 18, 72, 273, and 324-326. These major variations, together with other minor substitutions in amino acid sequences, allowed them to classify the LcrV alleles into five sequence types (A-E). They observed that the strains of different Y. pestis subspecies can have the same typ of LcrV, and different types of LcrV can exist within the same natural plague focus. The LcrV polymorphisms were structurally analyzed by comparing the modeled structures of LcrV from all available strains. All changes except one occurred either in flexible regions or on the surface of the protein, but local chemical properties (i.e. those of a hydrophobic, hydrophilic, amphipathic, or charged nature) were conserved across all of the strains. Polymorphisms in flexible and surface regions are likely subject to less selective pressure, and have a limited impact on the structure. In contrast, the substitution of tryptophan at position 113 with either glutamic acid or glycine likely has a serious influence on the regional structure of the protein, and these mutations might have an effect on the function of LcrV. The polymorphisms at positions 18, 72 and 273 were accountable for differences in oligomerization of LcrV. The importance of the latter property in emergence of epidemic strains of Y. pestis during evolution of this pathogen will need to be further investigated.

  8. The dual role of fragments in fragment-assembly methods for de novo protein structure prediction

    Science.gov (United States)

    Handl, Julia; Knowles, Joshua; Vernon, Robert; Baker, David; Lovell, Simon C.

    2013-01-01

    In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta. PMID:22095594

  9. The structure of pyogenecin immunity protein, a novel bacteriocin-like immunity protein from streptococcus pyogenes.

    Energy Technology Data Exchange (ETDEWEB)

    Chang, C.; Coggill, P.; Bateman, A.; Finn, R.; Cymborowski, M.; Otwinowski, Z.; Minor, W.; Volkart, L.; Joachimiak, A.; Wellcome Trust Sanger Inst.; Univ. of Virginia; UT Southwestern Medical Center

    2009-12-17

    Many Gram-positive lactic acid bacteria (LAB) produce anti-bacterial peptides and small proteins called bacteriocins, which enable them to compete against other bacteria in the environment. These peptides fall structurally into three different classes, I, II, III, with class IIa being pediocin-like single entities and class IIb being two-peptide bacteriocins. Self-protective cognate immunity proteins are usually co-transcribed with these toxins. Several examples of cognates for IIa have already been solved structurally. Streptococcus pyogenes, closely related to LAB, is one of the most common human pathogens, so knowledge of how it competes against other LAB species is likely to prove invaluable. We have solved the crystal structure of the gene-product of locus Spy-2152 from S. pyogenes, (PDB: 2fu2), and found it to comprise an anti-parallel four-helix bundle that is structurally similar to other bacteriocin immunity proteins. Sequence analyses indicate this protein to be a possible immunity protein protective against class IIa or IIb bacteriocins. However, given that S. pyogenes appears to lack any IIa pediocin-like proteins but does possess class IIb bacteriocins, we suggest this protein confers immunity to IIb-like peptides. Combined structural, genomic and proteomic analyses have allowed the identification and in silico characterization of a new putative immunity protein from S. pyogenes, possibly the first structure of an immunity protein protective against potential class IIb two-peptide bacteriocins. We have named the two pairs of putative bacteriocins found in S. pyogenes pyogenecin 1, 2, 3 and 4.

  10. New tips for structure prediction by comparative modeling

    OpenAIRE

    Rayan, Anwar

    2009-01-01

    Comparative modelling is utilized to predict the 3-dimensional conformation of a given protein (target) based on its sequence alignment to experimentally determined protein structure (template). The use of such technique is already rewarding and increasingly widespread in biological research and drug development. The accuracy of the predictions as commonly accepted depends on the score of sequence identity of the target protein to the template. To assess the relationship between sequence iden...

  11. The contact activation proteins: a structure/function overview

    NARCIS (Netherlands)

    Meijers, J. C.; McMullen, B. A.; Bouma, B. N.

    1992-01-01

    In recent years, extensive knowledge has been obtained on the structure/function relationships of blood coagulation proteins. In this overview, we present recent developments on the structure/function relationships of the contact activation proteins: factor XII, high molecular weight kininogen,

  12. Structural Determinats Underlying Photoprotection in the Photoactive Orange Carotenoid Protein of Cyanobacteria

    Energy Technology Data Exchange (ETDEWEB)

    Wilson, Adjele; Kinney, James N.; Zwart, Petrus H.; Punginelli, Claire; D' Haene, Sandrine; Perreau, Francois; Klein, Michael G.; Kirilovsky, Diana; Kerfeld, Cheryl

    2010-04-01

    The photoprotective processes of photosynthetic organisms involve the dissipation of excess absorbed light energy as heat. Photoprotection in cyanobacteria is mechanistically distinct from that in plants; it involves the Orange Carotenoid Protein (OCP), a water-soluble protein containing a single carotenoid. The OCP is a new member of the family of blue light photoactive proteins; blue-green light triggers the OCP-mediated photoprotective response. Here we report structural and functional characterization of the wildtype and two mutant forms of the OCP, from the model organism Synechocystis PCC6803. The structural analysis provides highresolution detail of the carotenoidprotein interactions that underlie the optical properties of the OCP, unique among carotenoid-proteins in binding a single pigment per polypeptide chain. Collectively, these data implicate several key amino acids in the function of the OCP and reveal that the photoconversion and photoprotective responses of the OCP to blue-green light can be decoupled.

  13. Mini Heme-Proteins: Designability of Structure and Diversity of Functions.

    Science.gov (United States)

    Rai, Jagdish

    2017-08-30

    Natural heme proteins may have heme bound to poly-peptide chain as a cofactor via noncovalent forces or heme as a prosthetic group may be covalently bound to the proteins. Nature has used porphyrins in diverse functions like electron transfer, oxidation, reduction, ligand binding, photosynthesis, signaling, etc. by modulating its properties through diverse protein matrices. Synthetic chemists have tried to utilize these molecules in equally diverse industrial and medical applications due to their versatile electro-chemical and optical properties. The heme iron has catalytic activity which can be modulated and enhanced for specific applications by protein matrix around it. Heme proteins can be designed into novel enzymes for sterio specific catalysis ranging from oxidation to reduction. These designed heme-proteins can have applications in industrial catalysis and biosensing. A peptide folds around heme easily due to hydrophobic effect of the large aromatic ring of heme. The directional property of co-ordinate bonding between peptide and metal ion in heme further specifies the structure. Therefore heme proteins can be easily designed for targeted structure and catalytic activity. The central aromatic chemical entity in heme viz. porphyrin is a very ancient molecule. Its presence in the prebiotic soup and in all forms of life suggests that it has played a vital role in the origin and progressive evolution of living organisms. Porphyrin macrocycles are highly conjugated systems composed of four modified pyrrole subunits interconnected at their α -carbon atoms via methine (=CH-) bridges. Initial minimalist models of hemoproteins focused on effect of heme-ligand co-ordinate bonding on chemical reactivity, spectroscopy, electrochemistry and magnetic properties of heme. The great sensitivity of these spectroscopic features of heme to its surrounding makes them extremely useful in structural elucidation of designed heme-peptide complexes. Therefore heme proteins are

  14. The Protein Data Bank in Europe (PDBe): bringing structure to biology

    International Nuclear Information System (INIS)

    Velankar, Sameer; Kleywegt, Gerard J.

    2011-01-01

    Some future challenges for the PDB and its guardians are discussed and current and future activities in structural bioinformatics at the Protein Data Bank in Europe (PDBe) are described. The Protein Data Bank in Europe (PDBe) is the European partner in the Worldwide PDB and as such handles depositions of X-ray, NMR and EM data and structure models. PDBe also provides advanced bioinformatics services based on data from the PDB and related resources. Some of the challenges facing the PDB and its guardians are discussed, as well as some of the areas on which PDBe activities will focus in the future (advanced services, ligands, integration, validation and experimental data). Finally, some recent developments at PDBe are described

  15. The Protein Data Bank in Europe (PDBe): bringing structure to biology

    Energy Technology Data Exchange (ETDEWEB)

    Velankar, Sameer; Kleywegt, Gerard J., E-mail: gerard@ebi.ac.uk [Protein Data Bank in Europe (PDBe), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD (United Kingdom)

    2011-04-01

    Some future challenges for the PDB and its guardians are discussed and current and future activities in structural bioinformatics at the Protein Data Bank in Europe (PDBe) are described. The Protein Data Bank in Europe (PDBe) is the European partner in the Worldwide PDB and as such handles depositions of X-ray, NMR and EM data and structure models. PDBe also provides advanced bioinformatics services based on data from the PDB and related resources. Some of the challenges facing the PDB and its guardians are discussed, as well as some of the areas on which PDBe activities will focus in the future (advanced services, ligands, integration, validation and experimental data). Finally, some recent developments at PDBe are described.

  16. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  17. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2011-06-01

    Full Text Available Abstract Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet, which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i ubiquitous motifs, shared by several superfamilies and (ii superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  18. Secondary structure classification of amino-acid sequences using state-space modeling

    OpenAIRE

    Brunnert, Marcus; Krahnke, Tillmann; Urfer, Wolfgang

    2001-01-01

    The secondary structure classification of amino acid sequences can be carried out by a statistical analysis of sequence and structure data using state-space models. Aiming at this classification, a modified filter algorithm programmed in S is applied to data of three proteins. The application leads to correct classifications of two proteins even when using relatively simple estimation methods for the parameters of the state-space models. Furthermore, it has been shown that the assumed initial...

  19. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

    Directory of Open Access Journals (Sweden)

    Chi-Hua Tung

    2016-01-01

    Full Text Available Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.

  20. An Efficient Null Model for Conformational Fluctuations in Proteins

    DEFF Research Database (Denmark)

    Harder, Tim Philipp; Borg, Mikael; Bottaro, Sandro

    2012-01-01

    Protein dynamics play a crucial role in function, catalytic activity, and pathogenesis. Consequently, there is great interest in computational methods that probe the conformational fluctuations of a protein. However, molecular dynamics simulations are computationally costly and therefore are often...... limited to comparatively short timescales. TYPHON is a probabilistic method to explore the conformational space of proteins under the guidance of a sophisticated probabilistic model of local structure and a given set of restraints that represent nonlocal interactions, such as hydrogen bonds or disulfide...... on conformational fluctuations that is in correspondence with experimental measurements. TYPHON provides a flexible, yet computationally efficient, method to explore possible conformational fluctuations in proteins....

  1. STRUCTURAL FEATURES OF PLANT CHITINASES AND CHITIN-BINDING PROTEINS

    NARCIS (Netherlands)

    BEINTEMA, JJ

    1994-01-01

    Structural features of plant chitinases and chitin-binding proteins are discussed. Many of these proteins consist of multiple domains,of which the chitin-binding hevein domain is a predominant one. X-ray and NMR structures of representatives of the major classes of these proteins are available now,

  2. Rapid and reliable protein structure determination via chemical shift threading.

    Science.gov (United States)

    Hafsa, Noor E; Berjanskii, Mark V; Arndt, David; Wishart, David S

    2018-01-01

    Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .

  3. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  4. Hydration dynamics near a model protein surface

    International Nuclear Information System (INIS)

    Russo, Daniela; Hura, Greg; Head-Gordon, Teresa

    2003-01-01

    The evolution of water dynamics from dilute to very high concentration solutions of a prototypical hydrophobic amino acid with its polar backbone, N-acetyl-leucine-methylamide (NALMA), is studied by quasi-elastic neutron scattering and molecular dynamics simulation for both the completely deuterated and completely hydrogenated leucine monomer. We observe several unexpected features in the dynamics of these biological solutions under ambient conditions. The NALMA dynamics shows evidence of de Gennes narrowing, an indication of coherent long timescale structural relaxation dynamics. The translational water dynamics are analyzed in a first approximation with a jump diffusion model. At the highest solute concentrations, the hydration water dynamics is significantly suppressed and characterized by a long residential time and a slow diffusion coefficient. The analysis of the more dilute concentration solutions takes into account the results of the 2.0M solution as a model of the first hydration shell. Subtracting the first hydration layer based on the 2.0M spectra, the translational diffusion dynamics is still suppressed, although the rotational relaxation time and residential time are converged to bulk-water values. Molecular dynamics analysis shows spatially heterogeneous dynamics at high concentration that becomes homogeneous at more dilute concentrations. We discuss the hydration dynamics results of this model protein system in the context of glassy systems, protein function, and protein-protein interfaces

  5. Structure and stability insights into tumour suppressor p53 evolutionary related proteins.

    Directory of Open Access Journals (Sweden)

    Bruno Pagano

    Full Text Available The p53 family of genes and their protein products, namely, p53, p63 and p73, have over one billion years of evolutionary history. Advances in computational biology and genomics are enabling studies of the complexities of the molecular evolution of p53 protein family to decipher the underpinnings of key biological conditions spanning from cancer through to various metabolic and developmental disorders and facilitate the design of personalised medicines. However, a complete understanding of the inherent nature of the thermodynamic and structural stability of the p53 protein family is still lacking. This is due, to a degree, to the lack of comprehensive structural information for a large number of homologous proteins and to an incomplete knowledge of the intrinsic factors responsible for their stability and how these might influence function. Here we investigate the thermal stability, secondary structure and folding properties of the DNA-binding domains (DBDs of a range of proteins from the p53 family using biophysical methods. While the N- and the C-terminal domains of the p53 family show sequence diversity and are normally targets for post-translational modifications and alternative splicing, the central DBD is highly conserved. Together with data obtained from Molecular Dynamics simulations in solution and with structure based homology modelling, our results provide further insights into the molecular properties of evolutionary related p53 proteins. We identify some marked structural differences within the p53 family, which could account for the divergence in biological functions as well as the subtleties manifested in the oligomerization properties of this family.

  6. Protein energetic conformational analysis from NMR chemical shifts (PECAN) and its use in determining secondary structural elements

    Energy Technology Data Exchange (ETDEWEB)

    Eghbalnia, Hamid R.; Wang Liya; Bahrami, Arash [National Magnetic Resonance Facility at Madison, Biochemistry Department (United States); Assadi, Amir [University of Wisconsin-Madison, Mathematics Department (United States); Markley, John L. [National Magnetic Resonance Facility at Madison, Biochemistry Department (United States)], E-mail: eghbalni@nmrfam.wisc.edu

    2005-05-15

    We present an energy model that combines information from the amino acid sequence of a protein and available NMR chemical shifts for the purposes of identifying low energy conformations and determining elements of secondary structure. The model ('PECAN', Protein Energetic Conformational Analysis from NMR chemical shifts) optimizes a combination of sequence information and residue-specific statistical energy function to yield energetic descriptions most favorable to predicting secondary structure. Compared to prior methods for secondary structure determination, PECAN provides increased accuracy and range, particularly in regions of extended structure. Moreover, PECAN uses the energetics to identify residues located at the boundaries between regions of predicted secondary structure that may not fit the stringent secondary structure class definitions. The energy model offers insights into the local energetic patterns that underlie conformational preferences. For example, it shows that the information content for defining secondary structure is localized about a residue and reaches a maximum when two residues on either side are considered. The current release of the PECAN software determines the well-defined regions of secondary structure in novel proteins with assigned chemical shifts with an overall accuracy of 90%, which is close to the practical limit of achievable accuracy in classifying the states.

  7. Protein energetic conformational analysis from NMR chemical shifts (PECAN) and its use in determining secondary structural elements

    International Nuclear Information System (INIS)

    Eghbalnia, Hamid R.; Wang Liya; Bahrami, Arash; Assadi, Amir; Markley, John L.

    2005-01-01

    We present an energy model that combines information from the amino acid sequence of a protein and available NMR chemical shifts for the purposes of identifying low energy conformations and determining elements of secondary structure. The model ('PECAN', Protein Energetic Conformational Analysis from NMR chemical shifts) optimizes a combination of sequence information and residue-specific statistical energy function to yield energetic descriptions most favorable to predicting secondary structure. Compared to prior methods for secondary structure determination, PECAN provides increased accuracy and range, particularly in regions of extended structure. Moreover, PECAN uses the energetics to identify residues located at the boundaries between regions of predicted secondary structure that may not fit the stringent secondary structure class definitions. The energy model offers insights into the local energetic patterns that underlie conformational preferences. For example, it shows that the information content for defining secondary structure is localized about a residue and reaches a maximum when two residues on either side are considered. The current release of the PECAN software determines the well-defined regions of secondary structure in novel proteins with assigned chemical shifts with an overall accuracy of 90%, which is close to the practical limit of achievable accuracy in classifying the states

  8. Using linear algebra for protein structural comparison and classification.

    Science.gov (United States)

    Gomide, Janaína; Melo-Minardi, Raquel; Dos Santos, Marcos Augusto; Neshich, Goran; Meira, Wagner; Lopes, Júlio César; Santoro, Marcelo

    2009-07-01

    In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.

  9. Using linear algebra for protein structural comparison and classification

    Directory of Open Access Journals (Sweden)

    Janaína Gomide

    2009-01-01

    Full Text Available In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD and Latent Semantic Indexing (LSI techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.

  10. Ion pairs in non-redundant protein structures

    Indian Academy of Sciences (India)

    Ion pairs contribute to several functions including the activity of catalytic triads, fusion of viral membranes, stability in thermophilic proteins and solvent–protein interactions. Furthermore, they have the ability to affect the stability of protein structures and are also a part of the forces that act to hold monomers together.

  11. Structure-based Markov random field model for representing evolutionary constraints on functional sites.

    Science.gov (United States)

    Jeong, Chan-Seok; Kim, Dongsup

    2016-02-24

    Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.

  12. BACHSCORE. A tool for evaluating efficiently and reliably the quality of large sets of protein structures

    Science.gov (United States)

    Sarti, E.; Zamuner, S.; Cossio, P.; Laio, A.; Seno, F.; Trovato, A.

    2013-12-01

    In protein structure prediction it is of crucial importance, especially at the refinement stage, to score efficiently large sets of models by selecting the ones that are closest to the native state. We here present a new computational tool, BACHSCORE, that allows its users to rank different structural models of the same protein according to their quality, evaluated by using the BACH++ (Bayesian Analysis Conformation Hunt) scoring function. The original BACH statistical potential was already shown to discriminate with very good reliability the protein native state in large sets of misfolded models of the same protein. BACH++ features a novel upgrade in the solvation potential of the scoring function, now computed by adapting the LCPO (Linear Combination of Pairwise Orbitals) algorithm. This change further enhances the already good performance of the scoring function. BACHSCORE can be accessed directly through the web server: bachserver.pd.infn.it. Catalogue identifier: AEQD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQD_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 3 No. of lines in distributed program, including test data, etc.: 130159 No. of bytes in distributed program, including test data, etc.: 24 687 455 Distribution format: tar.gz Programming language: C++. Computer: Any computer capable of running an executable produced by a g++ compiler (4.6.3 version). Operating system: Linux, Unix OS-es. RAM: 1 073 741 824 bytes Classification: 3. Nature of problem: Evaluate the quality of a protein structural model, taking into account the possible “a priori” knowledge of a reference primary sequence that may be different from the amino-acid sequence of the model; the native protein structure should be recognized as the best model. Solution method: The contact potential scores the occurrence of any given type of residue pair in 5 possible

  13. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis.

    Science.gov (United States)

    Masso, Majid; Vaisman, Iosif I

    2008-09-15

    Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. A web server with supporting documentation is available at http://proteins.gmu.edu/automute.

  14. Structural Insights into Triglyceride Storage Mediated by Fat Storage-Inducing Transmembrane (FIT) Protein 2

    Science.gov (United States)

    Gross, David A.; Snapp, Erik L.; Silver, David L.

    2010-01-01

    Fat storage-Inducing Transmembrane proteins 1 & 2 (FIT1/FITM1 and FIT2/FITM2) belong to a unique family of evolutionarily conserved proteins localized to the endoplasmic reticulum that are involved in triglyceride lipid droplet formation. FIT proteins have been shown to mediate the partitioning of cellular triglyceride into lipid droplets, but not triglyceride biosynthesis. FIT proteins do not share primary sequence homology with known proteins and no structural information is available to inform on the mechanism by which FIT proteins function. Here, we present the experimentally-solved topological models for FIT1 and FIT2 using N-glycosylation site mapping and indirect immunofluorescence techniques. These methods indicate that both proteins have six-transmembrane-domains with both N- and C-termini localized to the cytosol. Utilizing this model for structure-function analysis, we identified and characterized a gain-of-function mutant of FIT2 (FLL(157-9)AAA) in transmembrane domain 4 that markedly augmented the total number and mean size of lipid droplets. Using limited-trypsin proteolysis we determined that the FLL(157-9)AAA mutant has enhanced trypsin cleavage at K86 relative to wild-type FIT2, indicating a conformational change. Taken together, these studies indicate that FIT2 is a 6 transmembrane domain-containing protein whose conformation likely regulates its activity in mediating lipid droplet formation. PMID:20520733

  15. Structural insights into triglyceride storage mediated by fat storage-inducing transmembrane (FIT protein 2.

    Directory of Open Access Journals (Sweden)

    David A Gross

    2010-05-01

    Full Text Available Fat storage-Inducing Transmembrane proteins 1 & 2 (FIT1/FITM1 and FIT2/FITM2 belong to a unique family of evolutionarily conserved proteins localized to the endoplasmic reticulum that are involved in triglyceride lipid droplet formation. FIT proteins have been shown to mediate the partitioning of cellular triglyceride into lipid droplets, but not triglyceride biosynthesis. FIT proteins do not share primary sequence homology with known proteins and no structural information is available to inform on the mechanism by which FIT proteins function. Here, we present the experimentally-solved topological models for FIT1 and FIT2 using N-glycosylation site mapping and indirect immunofluorescence techniques. These methods indicate that both proteins have six-transmembrane-domains with both N- and C-termini localized to the cytosol. Utilizing this model for structure-function analysis, we identified and characterized a gain-of-function mutant of FIT2 (FLL(157-9AAA in transmembrane domain 4 that markedly augmented the total number and mean size of lipid droplets. Using limited-trypsin proteolysis we determined that the FLL(157-9AAA mutant has enhanced trypsin cleavage at K86 relative to wild-type FIT2, indicating a conformational change. Taken together, these studies indicate that FIT2 is a 6 transmembrane domain-containing protein whose conformation likely regulates its activity in mediating lipid droplet formation.

  16. Host Proteins Determine MRSA Biofilm Structure and Integrity

    DEFF Research Database (Denmark)

    Dreier, Cindy; Nielsen, Astrid; Jørgensen, Nis Pedersen

    Human extracellular matrix (hECM) proteins aids the initial attachment and initiation of an infection, by specific binding to bacterial cell surface proteins. However, the importance of hECM proteins in structure, integrity and antibiotic resilience of a biofilm is unknown. This study aims...... to determine how specific hECM proteins affect S. aureus USA300 JE2 biofilms. Biofilms were grown in the presence of synovial fluid from rheumatoid arteritis patients to mimic in vivo conditions, where bacteria incorporate hECM proteins into the biofilm matrix. Difference in biofilm structure, with and without...... addition of hECM to growth media, was visualized by confocal laser scanning microscopy. Two enzymatic degradation experiments were used to study biofilm matrix composition and importance of hECM proteins: enzymatic removal of specific hECM proteins from growth media, before biofilm formation, and enzymatic...

  17. Camps 2.0: exploring the sequence and structure space of prokaryotic, eukaryotic, and viral membrane proteins.

    Science.gov (United States)

    Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij

    2012-03-01

    Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.

  18. Visualisation of variable binding pockets on protein surfaces by probabilistic analysis of related structure sets

    Directory of Open Access Journals (Sweden)

    Ashford Paul

    2012-03-01

    Full Text Available Abstract Background Protein structures provide a valuable resource for rational drug design. For a protein with no known ligand, computational tools can predict surface pockets that are of suitable size and shape to accommodate a complementary small-molecule drug. However, pocket prediction against single static structures may miss features of pockets that arise from proteins' dynamic behaviour. In particular, ligand-binding conformations can be observed as transiently populated states of the apo protein, so it is possible to gain insight into ligand-bound forms by considering conformational variation in apo proteins. This variation can be explored by considering sets of related structures: computationally generated conformers, solution NMR ensembles, multiple crystal structures, homologues or homology models. It is non-trivial to compare pockets, either from different programs or across sets of structures. For a single structure, difficulties arise in defining particular pocket's boundaries. For a set of conformationally distinct structures the challenge is how to make reasonable comparisons between them given that a perfect structural alignment is not possible. Results We have developed a computational method, Provar, that provides a consistent representation of predicted binding pockets across sets of related protein structures. The outputs are probabilities that each atom or residue of the protein borders a predicted pocket. These probabilities can be readily visualised on a protein using existing molecular graphics software. We show how Provar simplifies comparison of the outputs of different pocket prediction algorithms, of pockets across multiple simulated conformations and between homologous structures. We demonstrate the benefits of use of multiple structures for protein-ligand and protein-protein interface analysis on a set of complexes and consider three case studies in detail: i analysis of a kinase superfamily highlights the

  19. Visualisation of variable binding pockets on protein surfaces by probabilistic analysis of related structure sets.

    Science.gov (United States)

    Ashford, Paul; Moss, David S; Alex, Alexander; Yeap, Siew K; Povia, Alice; Nobeli, Irene; Williams, Mark A

    2012-03-14

    Protein structures provide a valuable resource for rational drug design. For a protein with no known ligand, computational tools can predict surface pockets that are of suitable size and shape to accommodate a complementary small-molecule drug. However, pocket prediction against single static structures may miss features of pockets that arise from proteins' dynamic behaviour. In particular, ligand-binding conformations can be observed as transiently populated states of the apo protein, so it is possible to gain insight into ligand-bound forms by considering conformational variation in apo proteins. This variation can be explored by considering sets of related structures: computationally generated conformers, solution NMR ensembles, multiple crystal structures, homologues or homology models. It is non-trivial to compare pockets, either from different programs or across sets of structures. For a single structure, difficulties arise in defining particular pocket's boundaries. For a set of conformationally distinct structures the challenge is how to make reasonable comparisons between them given that a perfect structural alignment is not possible. We have developed a computational method, Provar, that provides a consistent representation of predicted binding pockets across sets of related protein structures. The outputs are probabilities that each atom or residue of the protein borders a predicted pocket. These probabilities can be readily visualised on a protein using existing molecular graphics software. We show how Provar simplifies comparison of the outputs of different pocket prediction algorithms, of pockets across multiple simulated conformations and between homologous structures. We demonstrate the benefits of use of multiple structures for protein-ligand and protein-protein interface analysis on a set of complexes and consider three case studies in detail: i) analysis of a kinase superfamily highlights the conserved occurrence of surface pockets at the active

  20. Effect of solvent on the structure of a protein (H3.1) with a coarse-grained model with knowledge-based interactions

    Science.gov (United States)

    Pandey, Ras; Farmer, Barry

    2013-03-01

    Quality of solvent plays a critical role in modulating the structure of a protein along with the temperature. Using a coarse-grained Monte Carlo simulation based on three knowledge-based contact potentials (MJ, BT, BFKV) we examine the structure and dynamics of a histone (H3.1). The empty lattice sites constitute the effective solvent medium in which the protein is embedded. Residue-solvent characteristic interaction is based on the hydropathy index while the residue-residue interaction is used from the knowledge-based contact matrices derived from ensembles of protein structures in the protein data bank. Large scale simulations are performed to analyze the structure of protein for a range of residue-solvent interaction strength, a measure of the solvent quality with each potential. Unlike the monotonic thermal response, the radius of gyration of the protein exhibits non-monotonic dependence of the solvent strength. Quantitative comparison of the structure and dynamics emerging from three knowledge-based potentials will be presented in this talk. This work is supported by Air Force Research Laboratory.

  1. Discrete Haar transform and protein structure.

    Science.gov (United States)

    Morosetti, S

    1997-12-01

    The discrete Haar transform of the sequence of the backbone dihedral angles (phi and psi) was performed over a set of X-ray protein structures of high resolution from the Brookhaven Protein Data Bank. Afterwards, the new dihedral angles were calculated by the inverse transform, using a growing number of Haar functions, from the lower to the higher degree. New structures were obtained using these dihedral angles, with standard values for bond lengths and angles, and with omega = 0 degree. The reconstructed structures were compared with the experimental ones, and analyzed by visual inspection and statistical analysis. When half of the Haar coefficients were used, all the reconstructed structures were not yet collapsed to a tertiary folding, but they showed yet realized most of the secondary motifs. These results indicate a substantial separation of structural information in the space of Haar transform, with the secondary structural information mainly present in the Haar coefficients of lower degrees, and the tertiary one present in the higher degree coefficients. Because of this separation, the representation of the folded structures in the space of Haar transform seems a promising candidate to encompass the problem of premature convergence in genetic algorithms.

  2. Tuning structure of oppositely charged nanoparticle and protein complexes

    Energy Technology Data Exchange (ETDEWEB)

    Kumar, Sugam, E-mail: sugam@barc.gov.in; Aswal, V. K., E-mail: sugam@barc.gov.in [Solid State Physics Division, Bhabha Atomic Research Centre, Mumbai-400085 (India); Callow, P. [Institut Laue Langevin, DS/LSS, 6 rue Jules Horowitz, 38042 Grenoble Cedex 9 (France)

    2014-04-24

    Small-angle neutron scattering (SANS) has been used to probe the structures of anionic silica nanoparticles (LS30) and cationic lyszyme protein (M.W. 14.7kD, I.P. ∼ 11.4) by tuning their interaction through the pH variation. The protein adsorption on nanoparticles is found to be increasing with pH and determined by the electrostatic attraction between two components as well as repulsion between protein molecules. We show the strong electrostatic attraction between nanoparticles and protein molecules leads to protein-mediated aggregation of nanoparticles which are characterized by fractal structures. At pH 5, the protein adsorption gives rise to nanoparticle aggregation having surface fractal morphology with close packing of nanoparticles. The surface fractals transform to open structures of mass fractal morphology at higher pH (7 and 9) on approaching isoelectric point (I.P.)

  3. Relationship between Molecular Structure Characteristics of Feed Proteins and Protein In vitro Digestibility and Solubility.

    Science.gov (United States)

    Bai, Mingmei; Qin, Guixin; Sun, Zewei; Long, Guohui

    2016-08-01

    The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller's dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR) spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area) of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003); moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004). On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (pdigestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins.

  4. Energy landscape, structure and rate effects on strength properties of alpha-helical proteins

    International Nuclear Information System (INIS)

    Bertaud, Jeremie; Hester, Joshua; Jimenez, Daniel D; Buehler, Markus J

    2010-01-01

    The strength of protein domains is crucial to identify the mechanical role of protein domains in biological processes such as mechanotransduction, tissue mechanics and tissue remodeling. Whereas the concept of strength has been widely investigated for engineered materials, the strength of fundamental protein material building blocks and how it depends on structural parameters such as the chemical bonding, the protein filament length and the timescale of observation or deformation velocity remains poorly understood. Here we report a systematic analysis of the influence of key parameters that define the energy landscape of the strength properties of alpha-helical protein domains, including energy barriers, unfolding and refolding distances, the locations of folded and unfolded states, as well as variations of the length and pulling velocity of alpha-helical protein filaments. The analysis is facilitated by the development of a double-well mesoscale potential formulation, utilized here to carry out a systematic numerical analysis of the behavior of alpha-helices. We compare the results against widely used protein strength models based on the Bell model, one of the simplest models used to characterize the strength of protein filaments. We find that, whereas Bell-type models are a reasonable approximation to describe the rupture of alpha-helical protein domains for a certain range of pulling speeds and values of energy barriers, the model ceases to hold for very large energy barriers and for very small pulling speeds, in agreement with earlier findings. We conclude with an application of our mesoscale model to investigate the effect of the length of alpha-helices on their mechanical strength. We find a weakening effect as the length of alpha-helical proteins increases, followed by an asymptotic regime in which the strength remains constant. We compare strand lengths found in biological proteins with the scaling law of strength versus alpha-helix filament length. The

  5. The Staphylococcus aureus extracellular adherence protein (Eap) adopts an elongated but structured conformation in solution.

    Science.gov (United States)

    Hammel, Michal; Nemecek, Daniel; Keightley, J Andrew; Thomas, George J; Geisbrecht, Brian V

    2007-12-01

    The extracellular adherence protein (Eap) of Staphylococcus aureus participates in a wide range of protein-protein interactions that facilitate the initiation and dissemination of Staphylococcal disease. In this report, we describe the use of a multidisciplinary approach to characterize the solution structure of full-length Eap. In contrast to previous reports suggesting that a six-domain isoform of Eap undergoes multimerization, sedimentation equilibrium analytical ultracentrifugation data revealed that a four-domain isoform of Eap is a monomer in solution. In vitro proteolysis and solution small angle X-ray scattering studies both indicate that Eap adopts an extended conformation in solution, where the linkers connecting sequential EAP modules are solvent exposed. Construction of a low-resolution model of full-length Eap using a combination of ab initio deconvolution of the SAXS data and rigid body modeling of the EAP domain crystal structure suggests that full-length Eap may present several unique concave surfaces capable of participating in ligand binding. These results also raise the possibility that such surfaces may be held together by additional interactions between adjacent EAP modules. This hypothesis is supported by a comparative Raman spectroscopic analysis of full-length Eap and a stoichiometric solution of the individual EAP modules, which indicates the presence of additional secondary structure and a greater extent of hydrogen/deuterium exchange protection in full-length Eap. Our results provide the first insight into the solution structure of full-length Eap and an experimental basis for interpreting the EAP domain crystal structures within the context of the full-length molecule. They also lay a foundation for future studies into the structural and molecular bases of Eap-mediated protein-protein interactions with its many ligands.

  6. In silico sequence analysis and homology modeling of predicted beta-amylase 7-like protein in Brachypodium distachyon L.

    Directory of Open Access Journals (Sweden)

    ERTUĞRUL FILIZ

    2014-04-01

    Full Text Available Beta-amylase (β-amylase, EC 3.2.1.2 is an enzyme that catalyses hydrolysis of glucosidic bonds in polysaccharides. In this study, we analyzed protein sequence of predicted beta-amylase 7-like protein in Brachypodium distachyon. pI (isoelectric point value was found as 5.23 in acidic character, while the instability index (II was found as 50.28 with accepted unstable protein. The prediction of subcellular localization was revealed that the protein may reside in chloroplast by using CELLO v.2.5. The 3D structure of protein was performed using comparative homology modeling with SWISS-MODEL. The accuracy of the predicted 3D structure was checked using Ramachandran plot analysis showed that 95.4% in favored region. The results of our study contribute to understanding of β-amylase protein structure in grass species and will be scientific base for 3D modeling of beta-amylase proteins in further studies.

  7. Principal components analysis of protein structure ensembles calculated using NMR data

    International Nuclear Information System (INIS)

    Howe, Peter W.A.

    2001-01-01

    One important problem when calculating structures of biomolecules from NMR data is distinguishing converged structures from outlier structures. This paper describes how Principal Components Analysis (PCA) has the potential to classify calculated structures automatically, according to correlated structural variation across the population. PCA analysis has the additional advantage that it highlights regions of proteins which are varying across the population. To apply PCA, protein structures have to be reduced in complexity and this paper describes two different representations of protein structures which achieve this. The calculated structures of a 28 amino acid peptide are used to demonstrate the methods. The two different representations of protein structure are shown to give equivalent results, and correct results are obtained even though the ensemble of structures used as an example contains two different protein conformations. The PCA analysis also correctly identifies the structural differences between the two conformations

  8. Low-Resolution Structure of Detergent-Solubilized Membrane Proteins from Small-Angle Scattering Data.

    Science.gov (United States)

    Koutsioubas, Alexandros

    2017-12-05

    Despite the ever-increasing usage of small-angle scattering as a valuable complementary method in the field of structural biology, applications concerning membrane proteins remain elusive mainly due to experimental challenges and the relative lack of theoretical tools for the treatment of scattering data. This fact adds up to general difficulties encountered also by other established methods (crystallography, NMR) for the study of membrane proteins. Following the general paradigm of ab initio methods for low-resolution restoration of soluble protein structure from small-angle scattering data, we construct a general multiphase model with a set of physical constraints, which, together with an appropriate minimization procedure, gives direct structural information concerning the different components (protein, detergent molecules) of detergent-solubilized membrane protein complexes. Assessment of the method's precision and robustness is evaluated by performing shape restorations from simulated data of a tetrameric α-helical membrane channel (Aquaporin-0) solubilized by n-Dodecyl β-D-Maltoside and from previously published small-angle neutron scattering experimental data of the filamentous hemagglutinin adhesin β-barrel protein transporter solubilized by n-Octyl β-D-glucopyranoside. It is shown that the acquisition of small-angle neutron scattering data at two different solvent contrasts, together with an estimation of detergent aggregation number around the protein, permits the reliable reconstruction of the shape of membrane proteins without the need for any prior structural information. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  9. Hydra meiosis reveals unexpected conservation of structural synaptonemal complex proteins across metazoans

    OpenAIRE

    Fraune, Johanna; Alsheimer, Manfred; Volff, Jean-Nicolas; Busch, Karoline; Fraune, Sebastian; Bosch, Thomas C. G.; Benavente, Ricardo

    2012-01-01

    The synaptonemal complex (SC) is a key structure of meiosis, mediating the stable pairing (synapsis) of homologous chromosomes during prophase I. Its remarkable tripartite structure is evolutionarily well conserved and can be found in almost all sexually reproducing organisms. However, comparison of the different SC protein components in the common meiosis model organisms Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Mus musculus revealed...

  10. Structure modification and functionality of whey proteins: quantitative structure-activity relationship approach.

    Science.gov (United States)

    Nakai, S; Li-Chan, E

    1985-10-01

    According to the original idea of quantitative structure-activity relationship, electric, hydrophobic, and structural parameters should be taken into consideration for elucidating functionality. Changes in these parameters are reflected in the property of protein solubility upon modification of whey proteins by heating. Although solubility is itself a functional property, it has been utilized to explain other functionalities of proteins. However, better correlations were obtained when hydrophobic parameters of the proteins were used in conjunction with solubility. Various treatments reported in the literature were applied to whey protein concentrate in an attempt to obtain whipping and gelling properties similar to those of egg white. Mapping simplex optimization was used to search for the best results. Improvement in whipping properties by pepsin hydrolysis may have been due to higher protein solubility, and good gelling properties resulting from polyphosphate treatment may have been due to an increase in exposable hydrophobicity. However, the results of angel food cake making were still unsatisfactory.

  11. Exploring protein dynamics space: the dynasome as the missing link between protein structure and function.

    Directory of Open Access Journals (Sweden)

    Ulf Hensen

    Full Text Available Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics.

  12. Chaperonin Structure - The Large Multi-Subunit Protein Complex

    Directory of Open Access Journals (Sweden)

    Irena Roterman

    2009-03-01

    Full Text Available The multi sub-unit protein structure representing the chaperonins group is analyzed with respect to its hydrophobicity distribution. The proteins of this group assist protein folding supported by ATP. The specific axial symmetry GroEL structure (two rings of seven units stacked back to back - 524 aa each and the GroES (single ring of seven units - 97 aa each polypeptide chains are analyzed using the hydrophobicity distribution expressed as excess/deficiency all over the molecule to search for structure-to-function relationships. The empirically observed distribution of hydrophobic residues is confronted with the theoretical one representing the idealized hydrophobic core with hydrophilic residues exposure on the surface. The observed discrepancy between these two distributions seems to be aim-oriented, determining the structure-to-function relation. The hydrophobic force field structure generated by the chaperonin capsule is presented. Its possible influence on substrate folding is suggested.

  13. Crystal structure of Homo sapiens protein LOC79017

    Energy Technology Data Exchange (ETDEWEB)

    Bae, Euiyoung; Bingman, Craig A.; Aceti, David J.; Phillips, Jr., George N. (UW)

    2010-02-08

    LOC79017 (MW 21.0 kDa, residues 1-188) was annotated as a hypothetical protein encoded by Homo sapiens chromosome 7 open reading frame 24. It was selected as a target by the Center for Eukaryotic Structural Genomics (CESG) because it did not share more than 30% sequence identity with any protein for which the three-dimensional structure is known. The biological function of the protein has not been established yet. Parts of LOC79017 were identified as members of uncharacterized Pfam families (residues 1-95 as PB006073 and residues 104-180 as PB031696). BLAST searches revealed homologues of LOC79017 in many eukaryotes, but none of them have been functionally characterized. Here, we report the crystal structure of H. sapiens protein LOC79017 (UniGene code Hs.530024, UniProt code O75223, CESG target number go.35223).

  14. Protein structure prediction using bee colony optimization metaheuristic

    DEFF Research Database (Denmark)

    Fonseca, Rasmus; Paluszewski, Martin; Winter, Pawel

    2010-01-01

    of the proteins structure, an energy potential and some optimization algorithm that ¿nds the structure with minimal energy. Bee Colony Optimization (BCO) is a relatively new approach to solving opti- mization problems based on the foraging behaviour of bees. Several variants of BCO have been suggested......Predicting the native structure of proteins is one of the most challenging problems in molecular biology. The goal is to determine the three-dimensional struc- ture from the one-dimensional amino acid sequence. De novo prediction algorithms seek to do this by developing a representation...... our BCO method to generate good solutions to the protein structure prediction problem. The results show that BCO generally ¿nds better solutions than simulated annealing which so far has been the metaheuristic of choice for this problem....

  15. Nonlinear deterministic structures and the randomness of protein sequences

    CERN Document Server

    Huang Yan Zhao

    2003-01-01

    To clarify the randomness of protein sequences, we make a detailed analysis of a set of typical protein sequences representing each structural classes by using nonlinear prediction method. No deterministic structures are found in these protein sequences and this implies that they behave as random sequences. We also give an explanation to the controversial results obtained in previous investigations.

  16. Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy as an Analytical Method to Investigate the Secondary Structure of a Model Protein Embedded in Solid Lipid Matrices.

    Science.gov (United States)

    Zeeshan, Farrukh; Tabbassum, Misbah; Jorgensen, Lene; Medlicott, Natalie J

    2018-02-01

    Protein drugs may encounter conformational perturbations during the formulation processing of lipid-based solid dosage forms. In aqueous protein solutions, attenuated total reflection Fourier transform infrared (ATR FT-IR) spectroscopy can investigate these conformational changes following the subtraction of spectral interference of solvent with protein amide I bands. However, in solid dosage forms, the possible spectral contribution of lipid carriers to protein amide I band may be an obstacle to determine conformational alterations. The objective of this study was to develop an ATR FT-IR spectroscopic method for the analysis of protein secondary structure embedded in solid lipid matrices. Bovine serum albumin (BSA) was chosen as a model protein, while Precirol AT05 (glycerol palmitostearate, melting point 58 ℃) was employed as the model lipid matrix. Bovine serum albumin was incorporated into lipid using physical mixing, melting and mixing, or wet granulation mixing methods. Attenuated total reflection FT-IR spectroscopy and size exclusion chromatography (SEC) were performed for the analysis of BSA secondary structure and its dissolution in aqueous media, respectively. The results showed significant interference of Precirol ATO5 with BSA amide I band which was subtracted up to 90% w/w lipid content to analyze BSA secondary structure. In addition, ATR FT-IR spectroscopy also detected thermally denatured BSA solid alone and in the presence of lipid matrix indicating its suitability for the detection of denatured protein solids in lipid matrices. Despite being in the solid state, conformational changes occurred to BSA upon incorporation into solid lipid matrices. However, the extent of these conformational alterations was found to be dependent on the mixing method employed as indicated by area overlap calculations. For instance, the melting and mixing method imparted negligible effect on BSA secondary structure, whereas the wet granulation mixing method promoted

  17. Integrated Structural Biology for α-Helical Membrane Protein Structure Determination.

    Science.gov (United States)

    Xia, Yan; Fischer, Axel W; Teixeira, Pedro; Weiner, Brian; Meiler, Jens

    2018-04-03

    While great progress has been made, only 10% of the nearly 1,000 integral, α-helical, multi-span membrane protein families are represented by at least one experimentally determined structure in the PDB. Previously, we developed the algorithm BCL::MP-Fold, which samples the large conformational space of membrane proteins de novo by assembling predicted secondary structure elements guided by knowledge-based potentials. Here, we present a case study of rhodopsin fold determination by integrating sparse and/or low-resolution restraints from multiple experimental techniques including electron microscopy, electron paramagnetic resonance spectroscopy, and nuclear magnetic resonance spectroscopy. Simultaneous incorporation of orthogonal experimental restraints not only significantly improved the sampling accuracy but also allowed identification of the correct fold, which is demonstrated by a protein size-normalized transmembrane root-mean-square deviation as low as 1.2 Å. The protocol developed in this case study can be used for the determination of unknown membrane protein folds when limited experimental restraints are available. Copyright © 2018 Elsevier Ltd. All rights reserved.

  18. Fast protein tertiary structure retrieval based on global surface shape similarity.

    Science.gov (United States)

    Sael, Lee; Li, Bin; La, David; Fang, Yi; Ramani, Karthik; Rustamov, Raif; Kihara, Daisuke

    2008-09-01

    Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison. 2008 Wiley-Liss, Inc.

  19. Structural analysis of recombinant human protein QM

    International Nuclear Information System (INIS)

    Gualberto, D.C.H.; Fernandes, J.L.; Silva, F.S.; Saraiva, K.W.; Affonso, R.; Pereira, L.M.; Silva, I.D.C.G.

    2012-01-01

    Full text: The ribosomal protein QM belongs to a family of ribosomal proteins, which is highly conserved from yeast to humans. The presence of the QM protein is necessary for joining the 60S and 40S subunits in a late step of the initiation of mRNA translation. Although the exact extra-ribosomal functions of QM are not yet fully understood, it has been identified as a putative tumor suppressor. This protein was reported to interact with the transcription factor c-Jun and thereby prevent c-Jun actives genes of the cellular growth. In this study, the human QM protein was expressed in bacterial system, in the soluble form and this structure was analyzed by Circular Dichroism and Fluorescence. The results of Circular Dichroism showed that this protein has less alpha helix than beta sheet, as described in the literature. QM protein does not contain a leucine zipper region; however the ion zinc is necessary for binding of QM to c-Jun. Then we analyzed the relationship between the removal of zinc ions and folding of protein. Preliminary results obtained by the technique Fluorescence showed a gradual increase in fluorescence with the addition of increasing concentration of EDTA. This suggests that the zinc is important in the tertiary structure of the protein. More studies are being made for better understand these results. (author)

  20. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    Science.gov (United States)

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  1. Insight into the intermolecular recognition mechanism between Keap1 and IKKβ combining homology modelling, protein-protein docking, molecular dynamics simulations and virtual alanine mutation.

    Directory of Open Access Journals (Sweden)

    Zheng-Yu Jiang

    Full Text Available Degradation of certain proteins through the ubiquitin-proteasome pathway is a common strategy taken by the key modulators responsible for stress responses. Kelch-like ECH-associated protein-1(Keap1, a substrate adaptor component of the Cullin3 (Cul3-based ubiquitin E3 ligase complex, mediates the ubiquitination of two key modulators, NF-E2-related factor 2 (Nrf2 and IκB kinase β (IKKβ, which are involved in the redox control of gene transcription. However, compared to the Keap1-Nrf2 protein-protein interaction (PPI, the intermolecular recognition mechanism of Keap1 and IKKβ has been poorly investigated. In order to explore the binding pattern between Keap1 and IKKβ, the PPI model of Keap1 and IKKβ was investigated. The structure of human IKKβ was constructed by means of the homology modeling method and using reported crystal structure of Xenopus laevis IKKβ as the template. A protein-protein docking method was applied to develop the Keap1-IKKβ complex model. After the refinement and visual analysis of docked proteins, the chosen pose was further optimized through molecular dynamics simulations. The resulting structure was utilized to conduct the virtual alanine mutation for the exploration of hot-spots significant for the intermolecular interaction. Overall, our results provided structural insights into the PPI model of Keap1-IKKβ and suggest that the substrate specificity of Keap1 depend on the interaction with the key tyrosines, namely Tyr525, Tyr574 and Tyr334. The study presented in the current project may be useful to design molecules that selectively modulate Keap1. The selective recognition mechanism of Keap1 with IKKβ or Nrf2 will be helpful to further know the crosstalk between NF-κB and Nrf2 signaling.

  2. Validation of Structures in the Protein Data Bank.

    Science.gov (United States)

    Gore, Swanand; Sanz García, Eduardo; Hendrickx, Pieter M S; Gutmanas, Aleksandras; Westbrook, John D; Yang, Huanwang; Feng, Zukang; Baskaran, Kumaran; Berrisford, John M; Hudson, Brian P; Ikegawa, Yasuyo; Kobayashi, Naohiro; Lawson, Catherine L; Mading, Steve; Mak, Lora; Mukhopadhyay, Abhik; Oldfield, Thomas J; Patwardhan, Ardan; Peisach, Ezra; Sahni, Gaurav; Sekharan, Monica R; Sen, Sanchayita; Shao, Chenghua; Smart, Oliver S; Ulrich, Eldon L; Yamashita, Reiko; Quesada, Martha; Young, Jasmine Y; Nakamura, Haruki; Markley, John L; Berman, Helen M; Burley, Stephen K; Velankar, Sameer; Kleywegt, Gerard J

    2017-12-05

    The Worldwide PDB recently launched a deposition, biocuration, and validation tool: OneDep. At various stages of OneDep data processing, validation reports for three-dimensional structures of biological macromolecules are produced. These reports are based on recommendations of expert task forces representing crystallography, nuclear magnetic resonance, and cryoelectron microscopy communities. The reports provide useful metrics with which depositors can evaluate the quality of the experimental data, the structural model, and the fit between them. The validation module is also available as a stand-alone web server and as a programmatically accessible web service. A growing number of journals require the official wwPDB validation reports (produced at biocuration) to accompany manuscripts describing macromolecular structures. Upon public release of the structure, the validation report becomes part of the public PDB archive. Geometric quality scores for proteins in the PDB archive have improved over the past decade. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  3. Molecular structure, dynamics and hydration studies of soybean storage proteins and model systems by nuclear magnetic resonance

    International Nuclear Information System (INIS)

    Kakalis, L.T.

    1989-01-01

    The potential of high-resolution 13 C NMR for the characterization of soybean storage proteins was explored. The spectra of a commercial soy protein isolate as well as those of alkali-denatured 7S and 11S soybean globulins were well resolved and tentatively assigned. Relaxation measurements indicated fast motion for several side chains and the protein backbone. Protein fractions (11S and 7S) were also investigated at various states of molecular association. The large size of the multisubunit soybean storage proteins affected adversely both the resolution and the sensitivity of their 13 C NMR spectra. A comparison of 17 O and 2 H NMR relaxation rates of water in solutions of lysozyme (a model system) as a function of concentration, pH and magnetic field suggested that only 17 O monitors directly the hydration of lysozyme. Analysis of 17 O NMR lysozyme hydration data in terms of a two-state, fast-exchange, anisotropic model resulted in hydration parameters which are consistent with the protein's physico-chemical properties. The same model was applied to the calculation of the amount and mobility of bound water in soy protein dispersions by means of 17 O NMR relaxation measurements as a function of protein concentration. The protein concentration dependences of 1 H transverse NMR relaxation measurements at various pH and ionic strength values were fitted by a viral expansion. The interpretation of the data was based on the effects of protein aggregation, salt binding and protein group ionization on the NMR measurements. In all cases, relaxation rates showed a linear dependence on protein activity

  4. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites.

    Science.gov (United States)

    Jelínek, Jan; Škoda, Petr; Hoksza, David

    2017-12-06

    Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.

  5. Protein structure estimation from NMR data by matrix completion.

    Science.gov (United States)

    Li, Zhicheng; Li, Yang; Lei, Qiang; Zhao, Qing

    2017-09-01

    Knowledge of protein structures is very important to understand their corresponding physical and chemical properties. Nuclear Magnetic Resonance (NMR) spectroscopy is one of the main methods to measure protein structure. In this paper, we propose a two-stage approach to calculate the structure of a protein from a highly incomplete distance matrix, where most data are obtained from NMR. We first randomly "guess" a small part of unobservable distances by utilizing the triangle inequality, which is crucial for the second stage. Then we use matrix completion to calculate the protein structure from the obtained incomplete distance matrix. We apply the accelerated proximal gradient algorithm to solve the corresponding optimization problem. Furthermore, the recovery error of our method is analyzed, and its efficiency is demonstrated by several practical examples.

  6. Mining protein loops using a structural alphabet and statistical exceptionality

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2010-02-01

    Full Text Available Abstract Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times. Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words. These structural words have low structural variability (mean RMSd of 0.85 Å. As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues and long loops. Moreover, half of

  7. Mining protein loops using a structural alphabet and statistical exceptionality.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Nuel, Gregory; Camproux, Anne-Claude

    2010-02-04

    Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 A). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of

  8. HDAPD: a web tool for searching the disease-associated protein structures

    Science.gov (United States)

    2010-01-01

    Background The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. Description The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. Conclusions Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases. PMID:20158919

  9. APOLLO: a quality assessment service for single and multiple protein models.

    Science.gov (United States)

    Wang, Zheng; Eickholt, Jesse; Cheng, Jianlin

    2011-06-15

    We built a web server named APOLLO, which can evaluate the absolute global and local qualities of a single protein model using machine learning methods or the global and local qualities of a pool of models using a pair-wise comparison approach. Based on our evaluations on 107 CASP9 (Critical Assessment of Techniques for Protein Structure Prediction) targets, the predicted quality scores generated from our machine learning and pair-wise methods have an average per-target correlation of 0.671 and 0.917, respectively, with the true model quality scores. Based on our test on 92 CASP9 targets, our predicted absolute local qualities have an average difference of 2.60 Å with the actual distances to native structure. http://sysbio.rnet.missouri.edu/apollo/. Single and pair-wise global quality assessment software is also available at the site.

  10. Combining neural networks for protein secondary structure prediction

    DEFF Research Database (Denmark)

    Riis, Søren Kamaric

    1995-01-01

    In this paper structured neural networks are applied to the problem of predicting the secondary structure of proteins. A hierarchical approach is used where specialized neural networks are designed for each structural class and then combined using another neural network. The submodels are designed...... by using a priori knowledge of the mapping between protein building blocks and the secondary structure and by using weight sharing. Since none of the individual networks have more than 600 adjustable weights over-fitting is avoided. When ensembles of specialized experts are combined the performance...

  11. Determination of structural fluctuations of proteins from structure-based calculations of residual dipolar couplings

    International Nuclear Information System (INIS)

    Montalvao, Rinaldo W.; De Simone, Alfonso; Vendruscolo, Michele

    2012-01-01

    Residual dipolar couplings (RDCs) have the potential of providing detailed information about the conformational fluctuations of proteins. It is very challenging, however, to extract such information because of the complex relationship between RDCs and protein structures. A promising approach to decode this relationship involves structure-based calculations of the alignment tensors of protein conformations. By implementing this strategy to generate structural restraints in molecular dynamics simulations we show that it is possible to extract effectively the information provided by RDCs about the conformational fluctuations in the native states of proteins. The approach that we present can be used in a wide range of alignment media, including Pf1, charged bicelles and gels. The accuracy of the method is demonstrated by the analysis of the Q factors for RDCs not used as restraints in the calculations, which are significantly lower than those corresponding to existing high-resolution structures and structural ensembles, hence showing that we capture effectively the contributions to RDCs from conformational fluctuations.

  12. Protein Structure Determination Using Chemical Shifts

    DEFF Research Database (Denmark)

    Christensen, Anders Steen

    is determined using only chemical shifts recorded and assigned through automated processes. The CARMSD to the experimental X-ray for this structure is 1.1. Å. Additionally, the method is combined with very sparse NOE-restraints and evolutionary distance restraints and tested on several protein structures >100...

  13. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

    Science.gov (United States)

    Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

    2016-10-07

    RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential

  14. Structural basis of transport function in major facilitator superfamily protein from Trichoderma harzianum.

    Science.gov (United States)

    Chaudhary, Nitika; Sandhu, Padmani; Ahmed, Mushtaq; Akhter, Yusuf

    2017-02-01

    Trichothecenes are the sesquiterpenes secreted by Trichoderma spp. residing in the rhizosphere. These compounds have been reported to act as plant growth promoters and bio-control agents. The structural knowledge for the transporter proteins of their efflux remained limited. In this study, three-dimensional structure of Thmfs1 protein, a trichothecene transporter from Trichoderma harzianum, was homology modelled and further Molecular Dynamics (MD) simulations were used to decipher its mechanism. Fourteen transmembrane helices of Thmfs1 protein are observed contributing to an inward-open conformation. The transport channel and ligand binding sites in Thmfs1 are identified based on heuristic, iterative algorithm and structural alignment with homologous proteins. MD simulations were performed to reveal the differential structural behaviour occurring in the ligand free and ligand bound forms. We found that two discrete trichothecene binding sites are located on either side of the central transport tunnel running from the cytoplasmic side to the extracellular side across the Thmfs1 protein. Detailed analysis of the MD trajectories showed an alternative access mechanism between N and C-terminal domains contributing to its function. These results also demonstrate that the transport of trichodermin occurs via hopping mechanism in which the substrate molecule jumps from one binding site to another lining the transport tunnel. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. An Algebro-Topological Description of Protein Domain Structure

    Science.gov (United States)

    Penner, Robert Clark; Knudsen, Michael; Wiuf, Carsten; Andersen, Jørgen Ellegaard

    2011-01-01

    The space of possible protein structures appears vast and continuous, and the relationship between primary, secondary and tertiary structure levels is complex. Protein structure comparison and classification is therefore a difficult but important task since structure is a determinant for molecular interaction and function. We introduce a novel mathematical abstraction based on geometric topology to describe protein domain structure. Using the locations of the backbone atoms and the hydrogen bonds, we build a combinatorial object – a so-called fatgraph. The description is discrete yet gives rise to a 2-dimensional mathematical surface. Thus, each protein domain corresponds to a particular mathematical surface with characteristic topological invariants, such as the genus (number of holes) and the number of boundary components. Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds. We introduce the notion of robust variables, that is variables that are robust towards minor changes in the structure/fatgraph, and show that the genus and the number of boundary components are robust. Further, we invesigate the distribution of different fatgraph variables and show how only four variables are capable of distinguishing different folds. We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH. In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH. PMID:21629687

  16. A study of quality measures for protein threading models

    Directory of Open Access Journals (Sweden)

    Rychlewski Leszek

    2001-08-01

    Full Text Available Abstract Background Prediction of protein structures is one of the fundamental challenges in biology today. To fully understand how well different prediction methods perform, it is necessary to use measures that evaluate their performance. Every two years, starting in 1994, the CASP (Critical Assessment of protein Structure Prediction process has been organized to evaluate the ability of different predictors to blindly predict the structure of proteins. To capture different features of the models, several measures have been developed during the CASP processes. However, these measures have not been examined in detail before. In an attempt to develop fully automatic measures that can be used in CASP, as well as in other type of benchmarking experiments, we have compared twenty-one measures. These measures include the measures used in CASP3 and CASP2 as well as have measures introduced later. We have studied their ability to distinguish between the better and worse models submitted to CASP3 and the correlation between them. Results Using a small set of 1340 models for 23 different targets we show that most methods correlate with each other. Most pairs of measures show a correlation coefficient of about 0.5. The correlation is slightly higher for measures of similar types. We found that a significant problem when developing automatic measures is how to deal with proteins of different length. Also the comparisons between different measures is complicated as many measures are dependent on the size of the target. We show that the manual assessment can be reproduced to about 70% using automatic measures. Alignment independent measures, detects slightly more of the models with the correct fold, while alignment dependent measures agree better when selecting the best models for each target. Finally we show that using automatic measures would, to a large extent, reproduce the assessors ranking of the predictors at CASP3. Conclusions We show that given a

  17. Crystal structure of the β2 adrenergic receptor-Gs protein complex

    DEFF Research Database (Denmark)

    Rasmussen, Søren Gøgsig Faarup; DeVree, Brian T; Zou, Yaozhong

    2011-01-01

    -occupied receptor. The β(2) adrenergic receptor (β(2)AR) activation of Gs, the stimulatory G protein for adenylyl cyclase, has long been a model system for GPCR signalling. Here we present the crystal structure of the active state ternary complex composed of agonist-occupied monomeric β(2)AR and nucleotide-free Gs...

  18. Structural characterization of Bacillus subtilis membrane protein Bmr: an in silico approach.

    Science.gov (United States)

    Nargotra, Amit; Rukmankesh; Ali, Shakir; Koul, Surrinder

    2014-01-01

    Efflux pump--a membrane protein belonging to Major Facilitator (MF) family and associated with Multi Drug Resistance (MDR) has been a major factor in drug resistance of bacteria. In the era when no new effective antibiotic had been reported for years, the detailed study of these membrane proteins became imperative in order to improve the efficacy of existing drugs. The Bacillus subtilis membrane protein Bmr belongs to the super family of major facilitator proteins and is one of the first-discovered bacterial multidrug-efflux transporters. Development of Bmr inhibitors (B. subtilis) for least resistance, better drug sustainability and effective cellular activity requires three dimensional structure of this protein which has not yet been determined. In this communication structural characterization of this important efflux pump has been attempted using in silico approaches. The modeled structure of Bmr has been found to have 12 main helical segments interspersed by loops of variable lengths at regular intervals with both N- and C-termini on the same side of membrane. Docking of the known inhibitor reserpine on to the predicted structure of Bmr and its mutants signified the importance of the residues Phe143, Val286 and Phe306 in the interaction with the ligand. Besides this, the role of Arg313 and Phe309 in the H-bond formation and π-π interaction respectively, with reserpine was the new significant finding based on the interaction studies. The structure elucidation of Bmr and the role of these residues in binding to the ligand are expected to have a great impact on the efflux pump inhibition studies around the world and hence in the efficiency of the existing antibiotic drugs.

  19. Density functional study of molecular interactions in secondary structures of proteins.

    Science.gov (United States)

    Takano, Yu; Kusaka, Ayumi; Nakamura, Haruki

    2016-01-01

    Proteins play diverse and vital roles in biology, which are dominated by their three-dimensional structures. The three-dimensional structure of a protein determines its functions and chemical properties. Protein secondary structures, including α-helices and β-sheets, are key components of the protein architecture. Molecular interactions, in particular hydrogen bonds, play significant roles in the formation of protein secondary structures. Precise and quantitative estimations of these interactions are required to understand the principles underlying the formation of three-dimensional protein structures. In the present study, we have investigated the molecular interactions in α-helices and β-sheets, using ab initio wave function-based methods, the Hartree-Fock method (HF) and the second-order Møller-Plesset perturbation theory (MP2), density functional theory, and molecular mechanics. The characteristic interactions essential for forming the secondary structures are discussed quantitatively.

  20. Protein crystal structure analysis using synchrotron radiation at atomic resolution

    International Nuclear Information System (INIS)

    Nonaka, Takamasa

    1999-01-01

    We can now obtain a detailed picture of protein, allowing the identification of individual atoms, by interpreting the diffraction of X-rays from a protein crystal at atomic resolution, 1.2 A or better. As of this writing, about 45 unique protein structures beyond 1.2 A resolution have been deposited in the Protein Data Bank. This review provides a simplified overview of how protein crystallographers use such diffraction data to solve, refine, and validate protein structures. (author)